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ABSTBACT 



__ The work reported culminates research by the Project 
on the Assessment and Analysis cf word Identification Skills in 
Beading. The lord Identification Test battery was designed for 
elementary school childre3Qr with attehtibh to the inajor issues 
pertaining to skills mastery^ and assessment that are raised in the 
review of mastery learning. FiVev-Ampprtant areas were of concern In 
the develoFment of the battery: (aj* basis on which target skills_ _ 
would be selected for inclusion; (b) facilitation of error analysis 
by creating. categorical distractors: (c) ease and efficiency of test 
administration; (dj independence of the test^battery from published 
inaterals to lessen the likelihood cf teachers_ teaching to the tests: 
and fei establishment of flexible standards for sJexiis mastery based 
on a global ineasure of comprehensionr rather than on arbitrary cutoff 
scores. The battery is_ comprised of five subtests within two major 
skill areaSr phonics and structural analysis^ The battery is a valid 
reliable instrument^ and iseasy to administers It can facilitate 
diagnostic decisions about apportionment of instructional time on the 
most freguently occurring phonics and structural elements. 
Performance standards are provided for each subtest. (Author/GK) 
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Abstract 

The work reported in this paper culminates four years of research 
by the Project on the Assessment and Analysis of Word Identification 
Skills in Reading. The goals of the research have been to: (a) explore 
the relationships between the mastery of word identification skills and 
comprehension abilities; (b) to develop a set of diagnostic subtests 
which assess the word identification skills of eiementctry school children 
and (c) to establish empirically-based mastery levels for each subtest, 
based on performance scores stratified by grade level and COTiprehension 
ability. 

In spring 1980, the final version of the Word Identification Test 
battery and the Reading Subtest of the Metropolitan Achievement Tests 
were administered to approximately 100 children at each grade level, one 
through five. The data were used to examine correlations between word 
identification skills, as measured by the various word identification 
subtests, and global comprehension ability, as measured by the standard- 
ized Metropolitan reading subtest. In addition, levels of skills 
mastery for each of the five cubskills assessed in the battery were 
established. This report presents some historical perspectives on 
word identification skills, documents the development of the test items, 
and summarizes the results of the analyses. 

The Word Identification Test battery is comprised of five subtests 
within two major skills areas: phonics and structural analysis. The 
subtests in the battery are unique in that ail target items (letter-sound 
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cbrrespohdences, inflected endings, affixes, and contractions s possessives) 
were developed, whenever Risible ^ in accordance with word frequency 
information and, hence, Reflect the most frequently occurring phonic 
and structural features of the English language. 

As part of the process to establish performance guidelines for the 
Word identification Test battery, an extensive review of mastery learning 
theory was conducted and the issue of mastery learning theory and its 
application to reading instruction examined. Traditionally, cutoff scores 
or mastery levels have been arbitrarily set by publishers of tests and 
appear to be absolute. For example, in loany skills management programs, 
a score of 80 percent or better indicates mastery of a particular skill. 
TO date, however, there has been ho empirical verification that a single 
percentage coiie6i. score should indicate mastery for ail skills. The 
word Identification Test battery uses a Unique approach for the establish- 
ment of mastery levels—instead of a single absolute criterion for 
mastery, the performance guidelines for each subtest in the battery take 
into accovint a child's grade level and comprehension ability. Using 
subpopuiations stratified by global comprehension ^ility, performance 
standards are provided for each subtest in the battery for every grade 
level tested. These empirically-derived performance guidelines range 
from 34.3% to 96.4%, depending on the subskill being measured and the 
grade level and comprehension ability of the student. More eloquently 
than any argument appearing in the literature, this range of e^qsected 
performance demonstrates the inappiopriateness of arbitrary, rigid mastery 
scores. 
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INTRODUCTION 



The work reported in this paper culminates four years of research 
by the Project on the assessment and ftnalys.ts of Word Identification Skills 
in Reading. The focus of the research has been to (aj explore the re- 
lationships between the mastery of word identification skills and 
comprehension abilities; (bj to develop a set of diagnostic subtests which 
assess the word identification skills of elementctry school children; and 
(c) to establish empirically based^jjcastery levels for each siabtest, 
based on the performance of groups of children stratified by grade level 
and comprehension ability. The test battery is comprised of five subtests 
within two major skill areas: phonics and structural analysis. 

The subtests in the Word Identification Test battery are unique in 
that decisions regarding the specific information to be assessed (letter- 
sound correspondences f inflected endings # affixes^ and contractions 
St possessives) were based, whenever possible f on frequency data. Hence ^ 
the subtests assess the most frequently occurring phonic and structural 
features of the English language- In addition, the formats of the siab- 
tests eliminate children's prior knowledge of vocabulary as a confounding 
factor in performance. 

The mastery levels (performance guidelines) for each subtest, 
determined by comprehension performance and grade level rather than by 
arbitrary cutoff scores, will be of value to teachers for obtaining 
diagnostic information. The Word Identification Test battery will thus 
provide teachers with important information upon which to base instruction 
in the word identification skills most related to reading comprehension. 

IE 
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During the last decade, increased attintion has been given to the 
individualization of instruction and to teacher accountability for pupils 
to achieve minimal competency in reading. The result has been a growing 
emphaiis on skills development in reading. In an effort to individualize 
instruction, particularly in the basic skills area of reading, diagnostic 
testing has become more prevalent and, consequently, more time-consuming. 
Within the past ten years, over a dozen programs have been developed which 

essentially skills management iyitems (e.g., tlie Wisconsin Design 
for Reading Skill Development) . Furthermore, most basal reading series 
published during this period have included a Substantial skills manage- 
ment component. The numerous skills Management systems and basal reading 
programs have all made heavy use of criterion-referenced testing for 
assessment of the various subskiils of reading. 

In line with this current ©ophasis on the diagnostic assessment of 
heading skills, the present study s5ught to identify those word identi- 
fication skills which correlate most highly with reading comprehension, 
to examine methods of assessing word identification skills, and to 
develc^ a set of valid and reliable diagnostic tests to assess these 
skills. 

over the last few years, considerable attention has been given to 
defining, asseising, analyzing, and teaching the ttaee fundamental 
components 5f word identification: phonic analysis, structural analysis, 
and contextual analys is^ (Johnson S Pearson, 1?78) . Because of the 

^ Phonic analysis processei which help children Pf°"°^=|^f^^- 
iaf p rinted wbrds^a n aid to understanding thexr meanings. Structural 
^alvsis I morph^c: analysis) . processes which help ^hiidren^jte^g, 
the leani ngs of unfamiliar printed words hi discerning gexr meaningful 
parts. Contextua3^1y5±3 . processes which help children understgd 
Se meanings of printedWds or phrases which are unfamxlxar to 
or which hJlp children learn new meanings for f ami xar words ^and Phrases, 
^ Sending to the context of the material surrounding the given word 
or phrase. 
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curre::t infortnatiori cbiicernihg word identification has been based on 
spe»;u? Ttton only, slu empirical inveistigation of these issues is clearly 
warr^^nted ; 

First, v^ile reading educators agree that word identification 
skills are important for reading, it is not clear which correspondences, 
patterns, or strategies within each of the broad areas of phonics, 
structure, and context relate most closely to comprehension • For example r 
how necessary is it for children to know the 61 vowel clusters in the 
English language and the 2 to 14 pronunciations for each (e.g., ou^ as 
in soup , would , ground , sought) ? is it worthwhile to ispend instructional 
time teaching the rule which governs the pronunciation of x_ according 
to its position within a word (i.e., xylophone , exam^ , tax) ? Can we 
justify the numerous hours spent drawing short lines between syllables, 
because "better syilabicators are better readers"? 

In the schools, a myriad of rules governing letter-sound correspon- 
dences ctnd syllabication are taught, but many of these rules have 
exceptions or are completely erroneous. For example, the rule, "when 
two vowels are together ^ the first is long and the second silent," applies 
to only 45% of words at the primary level (Clymer, 1963) , and to only 
18% of words beyond the primary grades (Emans> 1967) . This rule certainly 
does not hold for such words as ocean ^ great s bread s sind pause , aunt, 
kraut . Of coxirse, many rules are upheld consistently (e.g., c_ is pro- 
nounced as /s/ before <e, i^, and y.' and as /k/ before most other letters) — 
but which of the rules are worth learning and warrant the instructional 
time required to teach and practice them? 



is 
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Second, the issue of mastery learning theory and its application 
to reading instruction must be examined. Traditionally, teachers inter- 
pret scores of, say, 80% to mean "mastery" of a particular skill. This 
notion of ah arbitrary standard is reinforced by the fact that most of 
the criterion-referenced tests used across the country have established 
cutoff scores of 70%, 80%, or 90% as indicators of mastery. To date, 
however, there has been no empirical verification that any single percen- 
tage correct score should indicate mastery. Moreover, there is disagree- 
ment among educators about what mastery really is. Some educators view 
mastery as an absolute state of proficiency; partial mastery is as illogical 
a concept as partial pregnancy or partial death. But it is unnatural 
to view the mastery of reading subskills in an absolute sense , because 
factors such as measurement error and attention to task must be considered 
on a continuum. Considering the most commonly accepted criterion-mastery 
level of 80%, it is justifiable to ask, "Hhy 80%?" And if 80%, "80% of 
how many of what?" And "what does mastery of a skill contribute to 
overall reading comprehension?" 

Despite the lack of an empirically based cutoff Score, the notion 
of mastery has strong implications for reading instruction in the class- 
room. An extensive review in the area of mastery learning theory was 
therefore undertaken by the Project (see the third section of this paper) . 
Becauii the ultimate goal of reading instruction is successful reading 
comprehension, the mastery levels (performance guidelines) in this study 
were established using subsamples based on performance on a global 
measvure of reading comprehension. 
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In order to individualize reading instruction, teachers mttst be able 
to effectively assess the major word identification skills. Successful 

individualization depends on valid and reliable diagnostic tests that 

- § . - 

delineate areas of stxength and weakness for specific word identification 

subskiiis- ft primary goal of the Project was the development of a Word 
Identification Test battery that assesses the phonics and structural 
analysis skills of elementairy school children. 

In spring 1977, a prototype of the Word identification Test battery 
was developed which assessed skills in three broad areas of word identi- 
fications phonics^ structxire, and context. In order to provide base- 
line information, the battery also included a section assessing reading 
readiness skills. The prototype was pilot tested on a total of 282 
pupils in grades one, three, and five. Following the data ahalysis^r 
revisions were made on individual sxitests; in winter 1977 and spring 
1978, the revised Word Identification Test battery (without the reading 
readiness component) and the Reading Siabtest of the Metropolitan Achieve- 
ment Tests (Farr, Prescott, Baiow, S Hogan, 1978) were administered to 
approximately 1,150 sacond, fbtirth, and sixth grade piablic elementary 
school children from five regions of the United States (Johnson, Pittelitian, 
Schwenker, Shriberg, & Morgan- Jan ty , 1978). Pbllowihg analysis of the 
data from these test administrations, the criteria which guided test 
construction were evaluated, and additional criteria were incorporated 
in the development of the present tests (Johnson, Pittelman, Schwenker, 
St Shriberg, 1979; Johnson, Shriberg, Pittelman, 5 Schwenker, 1979). In 
addition, the Project decided to limit the development of the Word 
Identification Test battery to the areas of phonics (witii Consonants and 



vowels Subtests) and structural analysis (with Inflected Endings and 
Affixes Subtests) . In an effort to gain as much information as possible 
in these areas, the Project conducted an extensive review of instructional 
p^ictices and existing test instruments on phonics and structure (see 
second section of this paper) (Johnson, PitteHnan, Schwenker, & Shriberg, 
1979; Johnson, Shriberg, Pittelman, & Schwenker, 1979) . 

Between winter 1978/79 and winter 1979/80, the revised phonics and 
structure subtests were administered to several hundred pupils in grades 
two through five. The primary purposes of these studies were to obtain 
item analysis information prior to pireparation of the final version of 
the tests and to evaluate the test directions and administrator's manual 
for each subtest (Johnson, Pittelman, Schwenker, S Shriberg, 1980). 

In spring 1980, the final version of the Word Identification Test 
battery was administered to approximately 600 first through fifth grade 
elementary school students. The performance data were used to examine 
correlations between reading subskills, as measured by «ie various 
subtests in the Woid Identification Test battery, and global comprehension 
ability, as measured by a standardized test of reading comprehinsion . 
in addition, entpirically based levels of skills mastery were established 
for each of the reading subskills assessed in the battery. 

The present report presents a review of educational practices and 
widely used assessment instruments in phonics and structure, a historical 
discussion of the issue of mastery learning tiieory, and documentation 
of the results of this final investigation. 



A REVIEW OF THE INSTRDCTIONMi TR2NDS KMD ASSESSMENT 
INSTRUMENTS IN THE AREAS OF PHONICS AND STRUCTURE 

As part of the procedure to develop a test battery to assess word 
identification skills in the areas of phonics and structure analysis, 
a survey of existing assessment instrtunents was conducted. Because the 
survey revealed that there were no valid and reliable instruments currently 
available, a review of the instructional trends for phonics ajid structural 
analysis was undertaken. It is interesting to note that while there 
has been a great deal of research on instruction in phonics^ there appears 
to be a lack of agreement as to what should be taught and assessed in 
the area of structural analysis. A primary source used for describing 
the instructional trends in phonics was Word Identification—Instructional 
Practices: The State of the Art , by Johnson and Baumann (in jpress) . 

Phonics 

Prior to 1800, reading instruction in America emphasized a strong 
synthetic phonics approach. Later, in the early 1800 's, Horace Mann 
introduced the "whole-word" method of teaching reading. This new method 
prevailed until the second half of the nineteenth century when phonics 
again beceune popular. Rigorous phonics programs dominated the reading 
amd language curricula from about 1880 to 1915. 

Between 1915 and 1940^ research on the teaching of word identification 
skills centered on the relative merits of a phonics versus a whole-word 
or look-say approach to reading. The majority of researchers in this 
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period who comparea phonics and iook-say methodologies noted superior 
results for instruction in phonics (Cuirier, 1923; Currier & Duguid, 
1916; Sarrison & Heard, 1931; Tate, 1937; Valentine, 1913). A classic 
study by Agnew (1939) found that primary grade children who received 
reading instruction with a heavy emphasis on phonics scored higher on 
tiits bf phonics ability, word pronunciation, oral reading, and vocabu- 
lary than did children of the same age instructed in the look-say 
approach . 

While research tended to support the efficacy of phonics as the 
most efficient means of teaching word identification skills, no dominant 
set of instructional practices emerged. The purpose of phonics instruc- 
tion is to teach Children how to pronounce "unknown" words. In order 
for phonics analysis to be effective, however, the "unknown" word must 
be in a child's speaking or listening vocabulary. The assumption is 
that the ability to pronounce the unknown word will automatically cue 
its meaning in semantic memory (Johhsoh S Baumann, in press) . 

The most popular approach to teaching phonics is based on the 
premise that if children are able to analyze words by segmenting them 
into parts, they should be able to recombine (blend) these parts into 
new units, thereby enabling them to transfer and apply this skill in 
decoding unfamiliar words. Thus, the skill of segmentation appears to 
be prerequisite for the ability to successfully blend. Children who 
could segment syllables were successful in blending training, which in 
ivin facilitated the learning of words. Research has shovm that both 
iegmentation and blending must be Mastered if a phonics approach is to 
be successful for generalizing to the reading of unfamiliar words (Fox & 



Routh, 1976; Jeffrey & Samuels, 1967; Jenkins, Bausell, & Jenkins, 1972; 
Muller, 1973) . 

The act of decoding, then, appears to be a three-^stage proceisis: 
children are initially taught letter-isoiind correspondences by analyzing 
Words in their speaking and listening vbcabularieis; they are then taught 
to segment words into phonemic units; and finally, they are instructed 
in the skill of blending these isolated sounds into known and previouisly 
urfcnown words. It is this last step, blending, that has been shown to 
be the most crucial in the transfer of phonics analysis skills to the 
reading of unfamiliar words (Johnson & Baumann, in press) . 

According to Venezky and Massaro (1976) , this ability to decode 
provides a certain degree of independence and self-assurance for be- 
ginning readers; that is, children acquire a manageable set of letter- 
sound associations upon which they can build a large number of words • 
in addition, phonics instruction, because of its emphasis on regular 
letter-sound associations, draws attention to the orthographically 
regular features of printed English worc3s--the procedure for analyzing 
printed words into subunits for pron\anciation facilitates acquisition 
of the patterns in our language which are also orthographically regular • 
And, in turn, because there are a limited number of ways that sequences 
of letters and letter groups can be put together to form English words, 
knowledge of this regularity can help the reader resolve the letters in 
a string that confoinns to the language (Masisarb, 1975) . 

With the widespread recognition of the importance of phonics in the 
reading curriculum, a dependable measure of phonics ability on which to 
base instruction is needed. The Project on ttie Assessment and Analysis 
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of word identification Skills in Reading has identified ieveral issues 
for consideration in the development and evaluatiSn bf a phonics instru- 
ment. First, the scopi 6f the test has to be addressed; that is, a 
decision has to be made as to which of the hundreds of spelling-to-sduhd 
correspondences in thi English language should be selected for assessment. 
Next, the format of the test has to be considered: recognition response 
or production response, group administration or individual administration, 
and decoding or encoding. Finally, the modes in which the target spel.Ung- 
to-sound correspondences and the response choices are presented must be 
determined. Pikulski and Shanahan (1980), after surveying a number of 
phonics tests (most of which were subtests of larger diagnostic test 
batteries), concluded that, to date, there was no phonics instrument 
available that rendered a systematic assessment of phonics skills. 

ft survey 5f the phonics coniponents of nine popularly used tests was 
conducted by the Project. Following the issues identified above, 
several phonics tests and phonics components of diagnostic and achievement 
tests were Evaluated: the California ftchieviment Tests (McGraw-Hill, 
1977), the Botel Reading inventory (Bote!, 1961), the Prescriptive Headin 
inventory (McGraw-Hill, 1972, 1976), the Skills Monitoring System for 
Reading and Word Identification (Ha^court Brace Jovanovich, Inc., 1975), 
the Wisconsin Design for Reading Skill Development (Otto, Miles, Kamm, 
& Stewart, 1972-1975), the Phonics Knowledge Survey (Durkin S Meshover, 
1964), the California Phonics Survey (Brown 5 COttrell, 1963), the 
Stanford Achievement Test (Madden, Gardner, Rudman, Karisen, & Merwin, 
1970-1974), and the Silent Reading Diagnostic Tests (Bond, Balow, & 
Hoyt, 1970) . 

2S 



ISSUES OF TEST CONSTRUCTION 
Scope-Qf Test: 

One important issue tiiat was addressed by the Project concerned the 
selection of correspondences for assessment. Results of the survey re- 
vealed tiiat most phonics instruments assess only a small number of the 
hundreds of spelling-to-sound correspondences in our language. The 
California Achievement Tests, for example, consist of only 25 items; 10 
items assess the entire consonants category (single-letter consonants, 
consonant digraphs, and consonant clusters) , 13 items assess single-letter 
vowels (ail either long or short) , cind 2 items assess vowel clusters or 
diphthongs. It is questionable whether performance on only a few items 
should form the basis for global judgments regarding children's overall 
competence with phonics. 

in addition to the number of correspondences to select for assess- 
ment, attention must also be focused on how often these correspondences 
occur in our Icunguage. Many of the tests reviewed assessed correspondences 
that have low frequencies of occurrence in the English language. For 
example^ is it important to assess the vowel cluster oa^ as /J>/ (as in 
broad) when oa^ as /*>/ appears only 9 times in the 20,000 most common 
English words? Children who learn the correspondence will have little 
occasion to apply it in decoding unknown words- Johnson and Baumann 
(in press) maintain that a diagnostic instrument should reflect the 
information learned in the classroom. By selecting for assessment only 
those correspondences which appear frequently in curriculum materials, 
this notion of ecological validity is upheld. 
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There is yet another issue— word position of the target correspon- 
dence: In many of the tests surveyed, a large number of the corres- 
pondences are not assessed in the position (s) in which they most typically 
occur in English words. The Wisconsin Design, for example, assesses the 
single letter v in final word position. Data from the venezky (Note 1) 
tabulations of spelling-to- sound correspondences indicate that v appears 
only twice in final position (although it appears 353 times in initial 
position) in the 20,000 most common English words. One has to consider, 
therefore, whether it is educationally prudent to assess v as /y/ in 
final position. 

Format of Test 

The most accurate procedure to use in assessing phonics skills is 
an oral productive task. The ideal phonics test would require the child 
to read aloud, while the exaiainer would record all pronunciation errors 
made on unfamiliar words. Pikulski and Shanahan (1980) agree that the 
functional use of phonics occurs when the examinee is presented with 
letters or words arid is required to pr-oduce some audible response, a pro- 
cedure generally used by individually administered tests. While an 
individually administered piodu5tive task would best reflect the ability 
to apply phonics knowledge, consideration must also be given to efficiency 
of assessment. Because there is no feasible way to obtairi oral responses 
from examiriees in a group testing situation, a group administered test 
must have a recognition rather than a productiori format. 

The Phonics Knowledge Survey (Durkin & Meshover, 1964) is an exa^ie 
of an oral production test. Children view separately each of 14 consonants 
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and the 5 sihgle^letter vowels and are asked to prbnoiince the corresponding 
sounds for the consonant letters, and the 5 long and 5 short corresponding 
sounds for the vowel letters. This productive method of assessment is 
not efficient, however, because of the time needed to administer the 
test to each individual child. The validity of the Phenics Knowledge 
Survey can also be questioned in terms of assessing letter-soiirid corres- 
pondences in isdlatibri, rather than within the context of words. 

The Botel Reading Inventory (1961) , also a productive test, utilizes 
a written format which can be administered to a large group or class. 
The written fortnat, however, puts emphasis on the encoding (sound-to- 
spelling) , rather than on the decoding (letter-to-sound), process. 
Hence, spelling performance, instead of phonics ability, is being 
measured . 

In summary, group administered tests using a recognition format are 
more efficient than individually administered tests using a production 
format. Although there is some minimal evidence to suggest that recog- 
nition phonics tasks may be easier to perform tiian production tasks 
(e.g., Guthrie, 1973; Pikuiski and Shanahan, 1980) point out that there 
are a variety of recognition and production formats that appear to vary 
considerably in difficulty. In other words, there may be more variation 
in difficulty between different recognition test formats^ or between 
different production test formats, than between a recognition test format 
and a production test format. 
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Bresentation of target Spellincr-to-Souna QSErespohdehces 
and^esponse Choices 

ftnothii issue central to the evaluation 8f m effective phonics 
instrument is whether the letter-sound correspondences being assessed 
are priiented in isolation, in real words, or in synthetic words. The 
value of assessing sounds in isolation is questionable because (a) it 
is not possible to produce the sounds associated with consonants or 
consonant clusters without adding a vowel sound (Groff, 1977); (b) pro- 
ducing sounds in isolation is an incomplete activity, and is therefore 
not sufficiently predictive of functional ability in phonics (Pikulski 
s Shanahan, 1980); and (c) the sounds of many letters, especially vowels, 
are determined by their orthographic environments (Chomsky & Hall, 1968; 
Venezky, 1967) - 

One alternative to assessing letters in isolation is to present the 
target letters witiiin a word. But the problem inherent in using actual 
words is that children may recognize the words as sight words and, hence, 
might not need to utilize a decoding strategy. The Skills Monitoring 
System for Reading and Word Identification (1975) , for example, assesses 
one letter-sound correspondence for ch in tiie real target word, each. 
The four response choices are dish , Chri^s^aias , anchor, and chair. Because 
all four response choices include actual pronunciations for ch, children 
must rely on pitoi knowledge of the pronunciations of the target word 
and the responsi Choices to arrive at the correct answer, this implies 
tiiat children Sast Recognize these words as sight words, and the correct 
answer ±s reached through auditory matching rather than deciding. Another 
problem in the use of real words is that the words are taken from word 
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lists which are often geheratied through random selection of words from 
reading materials and, therefore, may not allow for a careful evaluation; 
of a systematic full range of phonics skills (Pikulski & Shanahan, 1980) i 

A second alternative to assessing letters in isolation is to present 
the target letters for a correspondence within a synthetic word. This 
format allows letter-sound correspondences to be presented within appro- 
priate orthographic environments, and requires that children use phonics 
(decoding) skills rather tham a sight word approach. One concern w±ih 
the use of synthetic target words, however, is ensuring that the synthetic 
words conform to phonological principles of the English language. Sipay 
(1974) states thai, "if nonsense syllables are difficult to pronounce, 
or if the letter sequence confuses the leatrner, the examiner may be misled 
into concluding that the learner has weak word analysis skills" (p. 5) . 
In the tests surveyed by the Project, the synthetic words did not always 
show phonological conformity. For exsunple, in the Phonics Knowledge 
Survey, children are asked to pronounce the sbxind made by the a^ in the 
synthetic target word^ aef . The correct answer is given as long a^, 
because children are expected to apply the "rule" that when two vowels 
are together, the first vowel says its name and the second vowel is 
silent- But Vehezky's (1970) tabulations show that ae^ in initial position 
is never pronounced as long a^. Pikulski and ShancQiah (1980), object to 
the use of nonsense words because "the examinee is deprived of the 
opportunity to match the arrived at pronunciation for a test word with 
a word that is a pstrt of his or her vocabulary." 

Careful attention must also be given to the development of the 
response choices in a phonics test. One question to address is how many 




response choices should be developed for each item, because the number 
of response choices can affect the Reliability of the test. Many of the 
tests which the Project surveyed had true-false, same-different, or yes- 
no formats, which greatly increase the likelihood that students will 
arrive at correct answers by guessing. Most of th^e tests, however, had 
a multiple choice format, with the number of response choices varying 

from three to five . 

Another consideration affecting the development of response choices 
is the nuiiflDer of syllables in the words used as response choices. Many 
of the tests reviewed were not consistent in contrbllihg for the nmnber 
of syllables in the response choices within an item. The Wisconsin 
Design, the Skills Monitoring System for Reading and Word Identification, 
and the Stanford Achievement Test, for example, all include response 
choices with varying numbers of syllables. In this regard, educators 
(e.g. Massaro, Note 2) have expressed concern that decoding s multi- 
syllable woRd may Require more complex processing than decoding a one- 
syltaJDie word, and that the inclusion of both kinds of words within a 
given test item may confuse young children. 

A third issue in evaluating response choices relates to the position 
of the target letters of the correspondence within a response choice 
word (initial, medial, or final). According to Venezky (Note 1), the 
position (s) of greatest occurrence varies for different letters: It 
seems logical, therefore, to assess a letter-sound correspondence in the 
word position in which it most frequently occurs. The ixjsition of the 
target letters within a target word should also match the position of 
the letters within each of the response choices. Several of the reviewed 
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tests include items thai, contain shifts in the word position of target 
correspondences. The Prescriptive Reading Inventory, for example, 
has an item in which short £ is presented in medial position of the 
CVC word, cat. One of the response choices is the CV word, day , which 
has its vowel sound in final position. Young children may be required 
to use different, more complicated psychological processes when the 
correspondences of interest shift in position than when the target 
correspondences are ail in the same position within words. 

A fourth issue related to the development of response choices is 
the mode in which these response choices appear. All but two of the 
reviewed tests (the Phonics Knowledge Survey and the Bbtel Reading 
Inventory are productive tests) include response choices in the form of 
isolated single letters or letter clusters, real words, or pictures. 

The validity of utilizing single-letter or letter cluster response 
choices (as in the Prescriptive Reading Inventory and the Silent Reading 
Diagnostic Tests) is questionable. In a review of the Silent Reading 
Diagnostic Tests, for example, Kress (1972) refers to the response 
choices as "artificial graphic representationis , " and is dubious about 
whether they truly measure the phonics abilitieis being assessed: . . 

in Test 6^ for the beginning sound in natural , the child is to select 
pn , from gn, un^, tn, nt; in Test 7, for the ending sound in decay , he 
is to select cpaet from khaJc , kayn , cove , quet ; in Test 8, for the vowel 
sound at the beginning of the word win , he is to select i_ from x, i^, v, 
a." 

In the tests reviewed, the most common form of response choice is 

V 

the real word (the Prescriptive Reading Inventory, the Skills Monitoring 



System for Reading and Word Identification, the California Phonics 
Survey, the California Achievement Tests, and the Stanford Achievement 
Test) . One of the problems with using real word response choices is 
the likelihood of testing visual matching, rather than decoding. In 
the Stanford Achievement Test, for example, children are presented with 
the target letteS t in the real word ten. The three response choices 
are gate , nine , and been . Children can easily select the correct 
answer by visually matching the t in tenr with the ^ in gate. 

Of all the tests inspected, the Wisconsin Design was most consis- 
tent in using response choices in picture form. The use of pictures 
as response choices eliminates the problems associated with real or 
synthetic words—namely, the visual matching of real words, the 
Recognition of real words as sight words, and the concern that synthetic 
words may not conform to phonological rules of the English language, 
in the Wisconsin Design, the examiner pronounces the picture names of 
the target word and of all the response choices. The focus of this 
test, however, is on the auditory matching of sounds, rather than on 
the decoding of letter-so\and correspondences. 

THE PHONICS COMPONENT OP THE WORD IDENTIFICATION TEST BATTERY 

For the past three and a half years, the Project on ^e Assessment 
and Analysis of Word Identification Skills in Reading has been developing 
a phonics assessment instrument tliat addresses ail of the issues discussed 
above. The final version of the Phonics Test presents target corres- 
pondences in synthetic words that are phonologicaiiy accurate and four 
response choices in picture form. A format consisting of synthetic 
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target words and pictorial response choices ensures that the ihstriament 
comes as close as a recognition test can in assessing true phonics skills, 
rather than visual matching of letters and auditory matching of soiinds. 

The Phonics Test assesses 73 different spelling-to-sound corres- 
pondences with a total of 146 items. The Phonics Test is comprised of 
two subtests : a Consonants Subtest ^ composed of 42 single -letter con- 
sonant items, 38 consonant cluster items, and 10 consonant digraph items; 
and a ypweil^ Subtest , composed of 10 short vowel items, 10 long vowel 
items, 8 other single-letter vowels, and 28 vowel cluster itemis. Each 
spelling-to-sound correspondence is tested with two items. Selection 
of target items was based on frequency data from the Venezky (Note 1) 
tabulations of speiiing-to-sound correspondences of the 20,000 most 
common English words. Response choices are based on speech production 
data and perceptual information from tiie Bouma (1971) , Miller and Nicely 
(1955), and Peterson and Barney (1952) confusion matrix studies - 

Indeed, phonics instruction has been and will remain an integral 
part of most beginning reading programs, ftn effective and efficient 
instrument for assessing the phonics skills of primairy school children 
was therefore developed to help teachers plan and evaluate instruction. 

S tr uGtuf arl Anal y 

Structural Analysis is a strategy of word identification by which 
a reader determines tie meaning of an unfamiliar word by identifying 
its meaningful parts (Robinson, Monroe, Artiey, Huck, & Jenkins, 1965; 
Schubert, 1969? Johnson & Pearson, 1978) . This process involves 
analyzing words and dismantling them into units of mecining (i.e., roots. 
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inflected endings, syllables, prefixes, and suffixes), identifying the 
individual meanings, and then recombining these parts into a meaningful 
whole (Johnson S Baumann, in press) . Johnson and Baumart state thai 
structural analysis also aids in the pronxinciatidn of unknown words; 
Mdyle (1974), concurs, and defines structural analysis as "a method of 
analyzing a printed word to determine its meaning by identifying meaningful 
parts . . . which in turn many be blended into the sound of the word," 
Although educators differ in regard to definition, it is generally 
agreed that the primary purpose of structural analysis is to assist the 
reader in looking for the familiar, meaningful parts in words that aro. 
unfainiiiar as a total \ahit. 

Instruction in structural analysis skills is aimed at helping the 
reader identify the meaningful \anits of an unknown word. By analyzing 
the structure of a word in this way, a reader can often approximate the 

meaning of a new word- 

Most reading professionals recommend direct instruction in struc- 
tural analysis skills (Pearson & Johnson, 1978; Farr & Roser, 1979; 
Kstilin, 1971; Smith & Johnson, 1976; Spache, 1963; Stauffer, 1969), 
although there is a lack of agreement about the actual content that 
should be taught. Reading metiiods texts differ as to which skills to 
eir^hasize--some advocate an analytical approach using word configuration 
and context, while others proSote a synthetic approach stressing letter- 
sound relationships and structural analysis (Witty, Freeland, & Grotberg, 
1966) . 

According to S^che (1963) , instruction in structural analysis skills 
should proceed from basic shape or configuration, to phonic clues. 



compound words, and syllabication in the primary grades; then to roots, 
prefixes and suffixes in the intermediate grades. Witty ^ Freeland^ and 
(Srotberg view the structural analysis hierarchy as simple suffixes^ 
compound words, prefixes, root words with inflected endings, cind sylla- 
bication, dispersed from the primer to the third grade reader. In a 
more recent study. Otto and Chester (1976) advocate mastery of one skill 
level before proceeding to a more difficult skill. They identify six 
levels of difficulty within structural analysis skills: base words with 
prefixes and suffixes, singulars and plxirals, syllabication, accent, 
unaccented schwa, and possessive forms. Educators who support the 
synthetic approach for instruction (Bond s Wagner, 1969; Lamb & Arnold, 
1976; Johnson s Pearson, 1978) generally agree that structural analysis 
should encompass derivatives, variants, and compound words. Because 
this approach is gaining widespread popularity in language arts and 
reading programs throughout the United States, a discussion of these 
terms is presented below. 

DERIVATIVES 

As defined by Schubert (1969) , derivatives are root words with a 
prefix and/br suffix. Deighton (1959) classified variant and ihvariaint 
affixes into groups containing 68 commonly used prefixes and 100 commonly 
used suffixes, and concluded that at least two-thi.:ds of these derivatives 
provide clues to word meanings. Similarly, kean and Personke (1976) 
and Breen (1960) have compiled lists of affixes that should be taught, 
and Stauffer (1942) and Osburn (1925) each identified prefixes which 
warrant instructional attention. 
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Because derivatives affect meaning, proniSciation > and spelling 
patterns, there is concern as to which methods are best for teaching 
these forms, should a list of variant and invariant derivatives be 
memorized? Should they be taught as visual units? Or, can the deriva- 
tives be learned through a combined approach? 

It is generally agreed that the task of locating small words within 
larger words is misleading to students (Ekwall, 1970; Johnson s Pearson, 
1978; Smith & Johnson, 1976; Spache, 1976) because pronunciation and 
meaning are affected by the letter arrangements. (Notice the "little 
words" in father, s^ne, wine, potato , and honey.) But, although educators 
may concur about what not to teach, they are often in disagreement about 
what methods and skills should be taught. For example, Aaronson (1971) 
found that students profit most from a structured word list dictionary 
approach, but Spache (1976) proposed teaching derivatives only as visual 

and pronounceable units. 

Otto and Chester (1976) support an approach which is structured 
and hierarchical in nature. They do not advocate memorization; instead 
they emphasize direct deductive instruction, which is supported by many 
of the major basal series (American Book Co., 1968-1972; Glnn 720, 1976; 
Harper and Row Publishers, 1966; Scott Foresman Publishers, 1965). Most 
educators (Ekwall, 1970; Johnson & Pearson, 1978; taSb S Arnold, 1976; 
•smith & Johnson, 1976) would agree with Otto and Chester that memorialing 
long lists of derivatives is a meaningless exerciso. But teaching methods 
such as the uss of drill cards for word formation and discussion (Ekwall, 
1970) , subdividing unfamiliar words into meaningful parts (Smith & 
Johnson, 1976); building new words from familiar roots (Johnson & Pearson, 
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1978; Lainb S Arnold^ i976; Smith & Johnson, 1976) } and reading to apply 
derivatives witiiin a contextual framework (Johnson & Pearson, 1978; 
Smith S Johnson, 1976) are strongly advocated. 



VaricLhts are words which c Dhtaih a root and cin inflected ending 
(Lamb S Arnold, 1976) . Variant endings (often called inflectional 
endings) change a root word so that it conforms to its grammatical 
environment. They change vei^s by time agreonent, adjectives by compari* 
son, nouns by number agreonent, and adverbs by degree. Common examples 
of variant endings cure; s^, es , ed , ing , er, and est . According to 
Kean and Personke (1976) , the eight inflections of ^e English language 
are: noun plurals, noun possessives, present tense third person singular 
verb, past tense verb, present participle verbr past participle verb# 
comparative adjective, and superlative adjective. 

Variants are affected by contextual setting, and are most commonly 
taught within a hierarchical structure that is dependent upon degree 
of difficulty, grammatical class, and usage within context (Ekwall, 1970; 
Johnson & Pearson, 1978; Lamb & Arnold, 1976; Otto & Chester > 1976; Smith 
& Johnson, 1976) - Although educators have differences of opinion 
regarding instruction in inflected endings, there is general agreement 
that knowledge of variants can help children analyze unknown words. 

CC»^^OUMD WORDS 

A compound word is one in which two morphemes, each of which could 
stand alone as a root word, are combined to form one new word (Lamb & 



VARIANTS 




Arnold, 1976) . Helping children to identify the two word units of a 
compound word may seive as an early introduction to structural analysis 
skills (Smith, 1963) . 

Most basal series include lessons which introduce and give practice 
iil identifying compound words. Johnson and Pearson (1978), however^ 
have suggested a unique approach to the study of compound words. They 
reconnnend that children be made aware of the underlying structural 
relationships of compound word units, and provide a structural breakdown 
of six different compound word relationships: 

1) B is of ft: ft fishbone is a bone of fish. 

2) B is from A: Hayfever is ^ever €r€«n hay. 

3) B is for A: A dog biscuit is a biscuit for a dog. 

4) B is like A: A boxcar is a car like a_ box. 

5) B is A: A nobleman is a man who is_ noble. 

6) B does A: A crybaby is a baby that does cry. 

Syllabication is another word identification skill in which children 
often receive a great deal of instruction. While some educators advocate 
instruction in syllabication (Gates, 1947; Gray, I960; Karlin, 1971; 
Osburn, 1954; Smi^, 1963) , the value of instruction in isyllabication 
has been questioned by others (Deighton, 1959; Durrell, 1956; Glass, 
1965; Groff, 1971; Johnson & Pearson, 1978; Spache & Baggett, 1966? 
Zuck, 1974). Many students appear to use the sounds represented by the 
word parts to determine the number of syllables, rather tiian vice versa. 
If this is the case, the use of syllabication as a word analysis tool 
is of little value (Lairib & Arnold, 1976). Another criticism of 
syllabication as an aid in word recognition is that the dividing point 
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between syiiabies is not always clear (Wardhaugh> 1966) . For example, 
children are often taught that when dividing words into syllables iixey 
should divide vccv patterns between the consonants; Bat if the rule 
were applied to words like summer , mother , or father , the pronunciations 
rendered would be inaccurate. For example > in syllabicating the word 
father , the resultant syllables would be fat-her, leading to incorrect 
pronunciation of the word. 

Despite differences in their approaches to teaching structtiral 
analysis, writers of methods texts > basal series publishers, and reading 
theorists do agree on one point: Structural analysis is am integral 
part of reading instruction, and the end result of such instruction 
should be the understanding of meaning from dontext. Strtactural analysis 
skills should be used (and taught) in conjunction with other word iden- 
tification skills- The goal is to integrate structural analysis skills 
as one strategy which allows a reader to segment an unknown word into 
meaningful parts, and then to recbmbine these meaningful parts to make 
the total word recognizable, thereby facilitating comprehension. 

STRUCTURE eONS>ONENT OF THE WORD IDENTIFIGATION TEST BATTERY 

The structure component of the Word Identification Test battery 
assesses two areas of structural analysis whicl^^ theorists agree 

are crucial to the development of word identification ability: deriva- 
tives and variants. The battery includes an Affixes Subtest and an 
Inflected Endings Subtest. The third subtest in the battery. Contractions 
& Possessives> assesses the two uses of the apostrophe. The ability 
to distinguish between the two uses of the apostrophe is important for 
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obtaining the intended meaning of connected text which, in turn, affects 
comprehension . 

During 1969-1974, the witing mechanics of 9-13 and 17 year old 
students were examined by the National Assessment of Educational Progress 
(1975) . The uses of the apostiSphe in contractions and to show possession 
were included among the objectives considered important for students. 
That understanding the two uses of the apostrophe is necessary is re- 
inforced by Lloyd and Warfel (1972) , who noted that proofreaders of 
newspapers, advertisements, and weekly magazines often erroneously leave 
the apostrophe in "its" when the possessive pronoun, and not the con- 
traction of "it is" or "it has", is intended. 

Earlier versions of thi Word Identification Test battery included 
a subtest assessing con^und words; however, the subtest was eliminated 
from later versions of the battery. Although instruction in compound 
words is a valuable aid in structural analysis, assessing a child's 

^wledge of con^und words often becomes simply a measure of vocabulary 
knowledge, and the understanding of the underlying structure of a com- 
pouna word is not readily transferable to unfainiiiai compound words. 
The authors, therefore, decided that the allocation of instructional 
time to assessment of con^oiind words could not be justified. 

Thi Earliest version of the Word Identification Test battery also 
included a subtest assessing syllabication. As discussed earlier, however, 
the allocation of time to assessment, and perhaps even to instruction 
in syllabication skills, has been questioned by many educators. Deighton 
(1959) iummarized this position when he asserted, "To insist on mastery 
of 'rules' of syllabication is to Sake syllabication an end in itself . . . 
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the 'correct* way to divide a typed or printed word is of importance 
only to stenographers and printers, never to readers." 

Inflected Endings Subtest 

The selection of inflected Endings to be assessed in the Inflected 
Endings Subtest was based on a review of scope and sequence charts of 
basal reading series , on published tests for inflected endings, and on 
frequency information. 

To determine which inflected endings to assess, four basal reading 
series were surveyed: Girm 72G (1976 edition) , Houghton Mifflin (1971 
edition) > American Book Company : (1968-1972) , and Heath and Coinpany 
(1968 edition) . All four series prescribed instruction for the following 
inflected endings: (i)es , (i)ed , ing , er , s_, 's (s') , and (i)est . 

While most of the published tests that were reviewed assessed a 
sampling of inflected endings, the inflected ending items were usually 
incorporated into siobtests which assessed other structural cuialysis 
skills. The Doren Diagnostic Reading Test of Word Recognition Skills 
(1973) , Dn the other hand, had separate subtests to assess inflected 
endings ( ing [six items] , ed [four items] , er [two items] , and r_ and s_ 
[one item each]) and singulars and plurals ( ies [three ite^ns] ^ es_ [two 
items], and £ [two items]). Similarly, ah analysis of the Wisconsin 
Design for "Reading Skill Oevelopment revealed a separate 12-item subtest 

cdrxi::5ining -two itenis for each of six inflected endings - edT" s r~ ing> 

-s , er , and es^. However, detailed documentation of the criteria govern- 
ing icem selection was not provided for either the Doren or Wisconsin 
Desigti subtests 
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Based on infonn^tiSii Horn the review of basal reading series and 
published tests as well as from the survey of the literature, it was 
decided that the inflected Endings Subtest should include items which 
sampled tense markers, adjectives, and plurals. The inflected endings 
selected for assessment were: £ (as a plural), (e)& (as a verb), ed, 
ing , er , est , and ^. 

The number of itans created for each inflected ending was in pro- 
portion to how often that ending occurred in the language. Fre^ency 
information was obtained from the Ginn Lexicon Project Frequency 
Listing (Johnson s Baumann, 1979) , which is a compilation of four word 
frequency lists. ^ The 734 words in the lexicon which have a total 
frequency count of 300 or more were examined for the inflected endings 

selected for tiie test. A similar review was performed with the American 

3 

Heritage Frequency Book (Carroll, DaVies, S Richman, 1971). A 

detailed description of the development of the Inflected Endings Subtest 
is presented in the reports The AssessP Mtfc of structural Analysis SkiJ Is. 

(Johnson, Pittelman, Schwenker S Shriberg, 1979) and Interim Report; 
The Refinement of the Test Battery to Assess Word Identification Skills 

(Johnson, Pittelman, Schwenker, s Shriberg, 1980) . 



^fhe four word frequency lists con^rising the Ginn Lexicon Project 
F^emiehcy Listing are: Carroll, Davie s, and Riohman list U971) r Kucera- 
Francis list (1967) ; Hoe Picture Book Words (1973) ; ^d Mpe C5ral Langtrage 
:974TT~ir-t«yta2rT>f-l-87^79-diff^ 



Frequency Listingr The^frequency count for the words ranges from 20 to 
164,924. 

\he American Heritage Wor<^ Ereqneii5y Book is a word list based 
on iii examination of published material for children in third to ninth 
grade, and contains 5,088,721 tokeris and 86*741 words. A.total of 90 
s6hoois participated in the study , and over 5 million words of ^running 
text were extracted for analysis from 1,045 different publications. 
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Affixes Siabtest 

The selection of affixes for assessment was based on frequency 
information gathered from scope and sequence charts of basal reading 
series, from published tests of affixes, and from the SWRL Lexicon 
(Rhode S Cronnell , 1977) . 

Fo\ir widely used basal series were sturveyed to determine which 
affixes are consistently taught to elementary school children (Girin 720, 
Clymer et al., 1979 edition? Macmillah, Smith & Wardhaugh, 1975 edition; 
Houghton Mifflin, Durr et al., 1974-1978; Scott Fdresmari, Aaron et al., 
1976) • In most series, affixes were introduced in the beginning of 
second grade as syllabic word parts and form class markers. They were 
later reintroduced as meaningful word parts and as affixes in grades 
four through eight* A total of 11 prefixes and 14 suffixes were common 

to the instructional sequences of at least three of the four basai series. 

4 _ 

The SWRL Lexicon (1977) was also examined to determine the frequency 

of occurrence of words containing the affixes identified through the 

basal series survey. Upon completion of the SWRL frequency check, 

affixes with frequencies of less than iG were eliminated from fxirther 

consideration. This reduced the initial pool of affixes, derived from 

the review of the basal series, to 8 prefixes and 12 suffixes. One of 

the prefixes, non , was selected for inclusion « even though it was not 

^The SWRL Lexicon is a iO^OOO-wbrd lexicon of the basic vocabulary 
of children in kindergctrten through sixth grade. It is a selective 
compilation of eight sources which include studies of materials written 
for children, materials written by children, and studies of the oral 
language of children (Durr, 1970; Entwisle^ 1966; Green, Howard^ Joergeri, 
& Marino, 1958; Jacobs^ 1967; Kolson, 1960; Mtirphy et al., 1957; Rinsland, 
1945; aind Weaver, 1955) . 
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listed In the SWOs Eexicoh. This was because' hoft is considered to be 
a useful prefix and is taught in all four of the basal series Purveyed . 
a comparison of these affixes with those included in other teaching . 
and testing materials (Broska, Hodges, Patrick, Williams, s Oseroff* 
1973; Northern Valley Schools, 1976; Otto et al., 1972-1975; Shepherd, 
1973) supported the selection of the proposed list of target affixes 
for inclusion in the Affixes Subtest. 

The selection of root words to be affixed also was carefully con- 
sidered. Root words to be combined with the target affixes were chosen 
according to two criteria: (a) the root word should combine with at 
least two other affixes in order to create real word foils for the 
test items; and (b) the root word should be familiar to at least 70% 
of fourth graders, as indicated in The Living Word Vocabulary (Dale £ 
O'Roarke, 1976) 

ft list of potential root words—all of which frequently combine 
with affixes— was compiled from the fourth-grade vocabulary in the four 
basal series surveyed. This list was further modified to obtain at 
least four root words to contoihe With each target affix. Next^ potential 
root words were checked in The^Ving Word VOcabula^ for their appro- 
priateness for fourth grade. Root words meeting the 70% familiarity 
Criterion were then combined with appropriate affixes to create a pool 
of response choices. 



Finally, the response choices were reviewed to insure that their 



^ The Living ^ferd Vocabulary lists 43,000 words and their percentage 
scores based oh how familiar the words are to students in grades 4, 6, 8, 
10, 12, 13, and 16. 
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voccibuiary ieveis were as consistent as possible both within and across 
test items. The final pool of target affixes developed through this 
process consists of eight prefixes and id suffixes • Further documehtation 
of the test development is presented in the reports The Assessment of 
Structural toalysis Skills (Johnson, Pitte3iictn, Schwenker, & Shfiberg, 

1979) and Interim Report; The Refinement of the Test Battery to Assess 
Word Identification Skills (Johnson, Pittelman, Schwenker, & Shriberg, 

1980) . 

Contractions & Possessives Stabtest 

The first stage of development of the Contractions & Possessives 
Subtest was based on a two part procedure: (a) the identification of 
those contractions that are typically taught to second > third, and 
fourth grade students; and (b) a review of the formats used in the 
instruction and assessment of contractions j The four widely used basal 
series selected for review were: Ginn 720 (Clymer et al«, 1979 Rainbow 
Edition), Heath and Con^any (Witty, Bebell, & Freelahd, 1968 edition), 
American Book Company (Johnson et al«, 1968-1972), arid Hoiightbri Mifflin 
(Durr et al.> 1974-1978 edition). One skills management system, the 
Wisconsin Design for Reading Skill Development (Otto et al., 1972-1975), 
was also reviewed. A stirvey of these materials revealed that many of 
the contractions are taught by the end of second grade, and that all 
cdmihbn" cdntrac^tXoh's^rec by the end of third 

grade • 

The next stage in developing the Contractions s Possessives Subtest 
was to select the contractions to be assessed, and to decide upon the 
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number of items needed to assess each target contraction, fts with the 
other two structure stibtests* a decision was made to base the number 
of target items for each contraction on frequency information. First, 
the contractions were grouped into categories based on which men4jer of 
the word pair was contracted. For example, contractions of wilX, such 
as I'll, wefil-, he^JJ. , and they ' 11 formed one contraction category. 
Next, The flmeriean Herxtaqe Word Frequency Book (Carroll et al., 1971) 
(see footnote 2) , was used to determine the frequencies Of each of the 
specific contracted forms within the categories. Based on frequency 
tabulations of contracted fontis within categories, the contraction cate- 
gories were then rank-ordered and a proportionate number of specific 
contracted fonns were selected for inclusion in the Subtest. 

in addition to 21 items assessing contractions, ten items were 
created to assess possessives, resulting in a total of 31 items on the 
Contractions S Possessives Subtest. A detailed account of the development 
Of the Contractions & Possessives Subtest is presented in the Interim 
^eport4 The Refinement of the Test Battery to^ssess Word identification 
Skill&, (Johnson, Pittelman, Schwenker, S Shriberg, 1986). 

A primary goal of reading instruction is the integration of struc- 
tural analysis skills as a strategy for facilitating comprehension of 
the total word in context. All three structure subtests. Inflected Endings 
Affixes, '-^d Contractions a possessives, utilize a sentence context 



reqtiiring students to select a response to complete the sentence. 

Response foils are designed to be sonantically bi syntactically reasonable. 

To the extent that ah instructional program stresses reading for meaning 

and skill practice in context rather than in isolation, the method of 

. "J? 
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assessment used in the Structural Analysis Su±>tests is particularly 
appropriate • 

Summary 

All five subtests comprising the Word Identification battery were 
developed after a careful survey of the literatiire and of existing in- 
structional and assessment materials. It was evident from this review 
there was no assessment instrximent currently available that addressed 
all of the issues discussed above. As a result the Project on the Assess- 
ment and Analysis of Word Identification Skills in Reading undertook 
the development of such an instrument. For all subtests in the Word 
Identification Test battery, selection of target items was based on 
frequency counts/ ensuring that only those elements most frequently 
encountered and therefore most generalizable would be assessed. Formats 
were carefully designed to avoid features which might confound 
interpretation of test results, such as visual matching (for phonics 
siobtests) and vocabulary knowledge (for structure subtests) . 

The final version of the Word Identification Test battery is a valid 
and reliable instrument for assessing the phonics and structural analysis 
skills of elementary school istudents. The battery will provide teachers 
with information with which to make important instructional decisions. 



INTRODUCTION TO MASTERY LEAI^ING 



A&-Histori.c^l Perspective 

The history of testing human atbilities reaches far back into civi- 
lized times. The first recorded testing occurred in China 4^000 years 
ago, whieh civil searvice examinations were administered to Chinese 
government employees (Popham, 1980) - Assessment of human abilities in 
the United States, however, is a relatively new practice, originating 
as recently as the early 1900 's. During World War I, the Army Alpha 
and the Army Beta (for nonreaders) were developed to assess the intellectual 
skills of military personnel • 

Over the past 50 to 60 years, measurement specialists have followed 
the mental testing models established during World War I. A great 
effort was put forth to develop tests that would reflect aptitude and 
achievement in almost every subject area. Because scores from these 
instruments could differentiate among individuals in the content areas, 
such testing had its greatest application in schools. 

As the population in the United States increased, students entered 
the schools in greater nundDers and remained for longer periods of time. 
Educators had to develop criteria for deterinining wSich students would 

.be_eligiile - foi^^ promotion— to- higher- grade— — Meiny- schools-developed 

and administered criterion-referenced tests to assess subject matter 

considered essential for students to master. A substcintial percentage 

of students, however, were unable to reach the absolute standards set 

by the test objectives. As a result, educators moved towaurd norm-referenced 

35 49 



assessment testing. Whereby a student's performance would be viewed 
relative to the performance of his or her peers. The "average- 
performance of students within a particular grade became the standard 
level Of performance that classrckan teachers were urged to meet. In 
keeping with this standard, teachers developed instruction.il plans aimed 
at the average ^ility level (Westbury, 1970) . In most classrooms, 
instruction was based on a single set Of materials, objectives, and 
procedures, judged to be appropriate for the middle group of students. 

The trend toward average-based education remained relatively un- 
challenged for several years. Prominent psychologists {Hall, Termih, 
Gesell, Kuhiman) asserted that hereditary factors limited human capacity; 
environmental assistance, including instructional intervention, was 
considered ineffective in altering nature's decision. Thus, educators 
were given additional fuel for their middle-of-the-road teaching designs 
and continued to administer the established and routine sequences of 
instruction. 

Finally, in the 1920 's, Carleton Washbufne's Winnetka Plan and 
Henry Morrison's work at the University of Chicago's Laboratory School 
(Block, 1971) represented attempts to b^eak away from average-based 
education towards individualized instruction. Both approaches featured 
specific educational objectives, carefully sequenced learning units with 
accompanying nstructional materials, diagn ostic ins truments to judge 
student progress on each unit, supplemental corrective materials for 
students who needed additional help, and a flexible time schedule allowing 
children to progress through the units at an individual rate. But despite 
Washburne's and Morrison's frameworks for personalized education, the 
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majority of American schools were not influenced by their efforts. For 
nearly the first half of the 2bth ceritxiry, educational thinking continued 
to reflect the average-based philosophy , and students' abilities were 
assessed with norm-referenced tests. 

Finally in the late 1950 's and early 1960 's, coinciding with the 
surge of scientific advancements, the measurement of abilities through 
norm-referenced testing was challenged (Block/ 1974; Bloom^ 1976) . 
Jerome Bruner (1960) , a pioneer in the new movement, proclaimed that 
any subject matter could be taught to aihy child if certain instructional 
adjustments were made. Support for Bruner "s position ceune from a camp 
of developmental psychologists who were willing to admit that human 
learning could be affected by training and practice as well as by matura- 
tion. 

Following Bruner^ Glaser (1963) pointed out the heed to determine 
a student's level of proficiency if appropriate instruction is to occur. 
(Slaser suggested that since student achievement in a given subject 
matter ranges from no proficiency to perfect performance, assessment 
tools should indicate a student's exact level of proficiency and lead 
a teacher to identify the specific skill areas in need of further work. 
Glaser "s thesis was that teachers needed "information as to the degree 
of competence attained by a particular student which is independent of 
reference to the performance of others" (Glaser, 1963, p. 520). Thus, 
by the middle of the 20th century, a new philosophy of education emerged, 
which focused on measuring individual learning abilities through criterion- 
referenced testing designs. Models of school learning were developed. 
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based on minimal scores from these criterion-referenced tests. The new 
models represent a phenomenon in education often referred to as mastery 
leeurning. 

competency-abased -Edacation 

Mastery learning theory is the underlying concept of competency- 
based education (CBE)^ CBE, in its true form, is a design aimed at 
teaching basic subject-related skills. Skills, abilities, and attitudes 
designated as essential to student learning are identified and sequenced, 
objectives for instruction are written, and teaching and testing plans 
for reaching and measuring attainment of the objectives are developed. 
Each student's performance is monitored on a regular schedule to deter- 
mine whether the objectives are being met. CBE rests on the philosophy 
that if appropriate materials and methods of instruction are provided, 
students can attain basic goals set by the school. The successful 
attainment of these goals, in tvccn, will ultimately enable the student 
to lead a productive life (Spady, 1977; Torshen, 1977). 

in CBE, learning is a twb-fbld procedure: (a) minimum competencies 
must be set for all students to attain; and (b) provisions for advance- 
ment far beyond the mihimvim requirements must be designed. Once goals 
and objectives are established alternative materials and methods are 
collected to provide opportunities for diversity . xn .-teaching ^d 
learning. Time adjustments are made to allow for individual rates of 
learning. Ideally, instruction is offered to each learner when and for 
as long as it is needed. 
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Evaluation of a student's achievement in CBE is most often obtained 
by using criteribri-referericed tests designed to reflect each major 
objective in a course of study. A standard of succeiss is specified^ 
although it is usually arbitrarily determined, and students are expected 
to reach or exceed the standard. Students who fail to meet minimum 
standards are guided into further individual or group work. 

Despite the educational promises that the CBE approach offers, 
implementation has been less than satisfactory. The philosophy of CBE 
has been interpreted in various ways. Some schools claim to be using 
CBE plans when, in fact, they are using traditional instructional 
designs along with criterion-referenced tests- In other schools, CBE 
is practiced with the use of individually assigned texts, average-based 
instruction, and end-of-unit criterion-referenced tests. The ultimate 
misuse of CBE, however, comes at the state level where certification 
standards for student performance are established. Testing programs 
are adapted to measure state certification requirements, yet appropriate 
instructional adjustments are not made. Spady (±977) warns that states 
have jumped aboard the CBE bandwagon without a specific definition or 
plan for classroom use. With the current trends in education pushing 
toward identifying and measuring competencies^ the definition and 
implementation of mastery models need to be carefully examined. 

Definition oi Mastery gaming 

CT^OLL'S MODEL OF SCHOOL LEAI^IINS 

Throughout the history of education, the basic tenents underlying 
mastery learning have appeared from time to time. Psychologists, 



teachers, tutors, and parents, searching for ways to help children 
learn, have been driven by the belief that learning will occui: if 
sensitive, systematic instruction is provided. Carroll's (1963) Model 
of School Learning hai provided a theoretical framework that defines 
mastery learning projects across the United States, focuses on the 
teacher as the manager of children's learning: 

. . . aie function of the teacher is to specify what is to 
be learned, t5 motivate pupils to learn it, to provide them 
With instructional uiaterials, to administer these learning 
materials at a Sate suitable for each pupil, to monitor 
students' progress, to diagnose difficulties and provide 
proper remediation for them, to give praise and encourage- 
ment for good performance, and to give review and practice 
that will maintain pupil's learnings over long periods of 
time (Carroll, 1970, p. 71) . 

Carroll's (1971) model is built on the praise that all students 
could achieve mastery if given enough time and optimal opportunities 
to learn. Carroll, however, recognizes that not all students will 
achieve mastery of school tasks. For ex^ple, a student's unwillingness 
to invest adequate time on a task Is a variable that obstructs the 
learning process. Likewise, children will vary in the degree to which 
they benefit from instruction, attho^h most s^dents benefit from 
good instruction. Thus, the quality of instruction is important for 
school learning. Quality of instruction depends on such teacher charac- 
teristics as knowledge of the learning task, appropriate sequencing of 
ikii.ls, and the ability to measure a child's success at reaching 
objectives. 
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In summary^ Carroll believes that school learning is possible for 
almost every student when a competent teacher carefully controls the 
learning process. Controlling the learning process includes assessment 
of how much learning a pupil is gaining from instruction, Carroll's 
model rests on frequent and precise measurement. Frequent testing, 
administered as a functional part of instruction, should provide feed- 
back for teachers to use in planning corrective lessons or in advancing 
a student to the next prescribed stage in the learning sequence. Dis- 
crete and precise instruments, designed to present items \i^ich probe 
the mastery of stated instructional objectives, are central to the 
successful implementation of Carroll's model. 

BLOOM'S THEORY OF SCHOOL LEARNING 

According to Bloom (1976) , the goal of education should be to help 
all people attain the highest quality of life possible through promoting 
full development of each individual citizen: "what any person in the 
world can learn, almost all persons can learn if^ provided with appropriate 
prior and current conditions of learning" (p. 7) . 

Drawing heavily on Carroll's model. Bloom established a theory of 
school learning aimed at achieving this goal iDy attempting to predict 
and explain what he believes should happen in the process of education. 
Bloom (1971) interprets Carroll's position in the following ways 
... if students are normally distributed with respect to 
aptitude for some suiDject and all students are given exactly 
the same instruction (in terms of amount and quality of 
instruction and learning time allowed) , then achievement 
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measured at the sobject's cbmpietion will be normally distri- 
buted- Under such conditions, the correlation between aptitude 
and achievement will be relatively high (r = + .70 or higher). 
Conversely f if students are normally distributed with respect 
to aptitude, but the kind and quality of instruction or 
learning time allowed are made appropriate to the character- 
istics and needs of each learner, the majority of students will 
achieve subject mastery. The correlation between aptitude and 
achievement should approach zero tp. 50). 



GoiSg beyond the Carroll model. Bloom* s theory incorporates 
student characteristics, instruction, and learning outcomes. These 
interdependent, alterable variables are diagrammed in the model below: 



STUDENT 
CHARACTERISTICS 



INSTRUCTION 



LEARNING 
OUTCOMES 



cognitive Entry 
Behaviors 



Affective Entry 
Behaviors 




Quality of 
Instruction 



Level and T^e 
of Achievement 



Rate of Learning 



Affective Outcomes 
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Learning Tasks and Units 

A central issue iinderlying Bloom's theory was the development of 
learning tasks. In order for the Uieory to gain acceptance in school 
settings, the tasks had to be adaptable to: (a) group or individually 
based instruction; (b) traditional or "open" classroom settings; and 
(c) several types of instructional materials and teaching styles, ft 
learning task was defined as a unit of subject matter requiring between 
1 and 10 hours of a student's time. In teanns of implementation, a 
teacher wa-s expected to examine an entire course of subject material 
arid divide it into small units of instruction. Sometimes it was necessary 
to sequence the ta-sks in a hierarchical fashion. Each unit, equipped 
with a fixed set of objectives, was presented to the student — ideally, 
at a rate coiranensurate with his or her capabilities and under optimal 
instructional conditions . 

Strategies for feedback and corrective assignments are inherent 
in the unit task plan. Students are expected to successfully complete 
one task before moving ahead to the next level of coursework. Because 
students progress at different rates. Bloom (1971/ 1976, 1978, 1980) 
has attCTtpted to explain this variance in achievement in terms of his 
model. 

Cognitive Entry Behaviors 

Successful completion (mastery) of a learning lanit deperids to a 
great extent on what the student brings to the task. Iri summarizirig 
various short-term and longitudinal studies on cognitive entry behaviors. 
Bloom (1976) estimates that up to one-half of the variance in achievement 
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can be accounted for by noting student abilities at the outset of a 
learning task. Specific task-related skills and general learning 
^ilities (communication skills, learning styles, and so on) together 
fbfm "the total package of prerequisite behaviors. Readiness to learn 
also has a powerful impact on a student's success in school at all 
levels i 

Effective Entry Behaviors 

Attitude toward or interest in a learning task has an influence 
on achievement. A student beginning a unit of material with a positive 
attitude is more likely to reach a higher level of achievement than a 
student with a negative attitude. Based on research attempts. Bloom 
(1976) claims that the causal link between attitude and achievement may 
explain about one-fburUi of the variance in achievement scores. Two 
types of affect appear to develop as a stud^t progresses through the 
educational system: subject-related Interest and attitude toward school 
learning in general. According to BIooti, children are not born with 
a set of affective characteristics; hence it is the teacher's responsi- 
bility to motivate students in the content areas. Positive feelings 
in specific skill areas contribute to the total impression a student 
has of school learning. 

Quality of instruction 

in addition to what students bring to the learning task in terras 
of readiness and attitudes, the quality of instruction can have a 
significant effect on the learning of school tasks: "Who c^ learn in 
the schools is determined to a large extent by the conditions in the 
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school; the quality of instruction is a major determiner of who will 
learn well—the few or the many" (Bloom, 1976, 438) • 

Learners vary in the type and amount of instruction needed for 
successful performance. Bloom defines good instruction in terms of 
four aspects of teaching: (a) cues as to exact student responsibilities; 
(b) opportunities for active student participation in learning; (c) re- 
4nrforceinent when successful learning occurs; and (d) feedback on tasks 
completed and corrective assignments^ when necessary. High quality 
instruction, dissirs^Dle at any point in a student's career, is especially 
important in the formative stages when basic skills and affective 
characteristics are developing. 

Affective Outcomes 

According to Blboia (1978) , one of the most important outcomes of 
learning iis the influence of the affective domain oh the student's 
futttre achievement. An individual's perception of him- or herself as 
a learner in a content area not only influences school-related achieve- 
ment in that area, but can also have a long-range influence by encouraging 
or limiting career choices. Moreover, th^ overall perception a person 
develops regarding achievement across subject areas affects his or her 
self-concept, cuid perhaps even general mental health. 

Rate of Learning 

A stud nt does not progress at the same rate during all stages of 
learning. Adjustments in time need to be made as the student progresses 
through successive learning units. Early units in a sequence usually 
demand greater variations of time and adaptive instruction than later 
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emits, because entry skills and attitudes are different. With good 
instruction and proper motivation, Bloom's t? ■ / (i§76j proposes that 
variation in learning rates will decrease as 2 student approaches the 
final units in a sequence. Good instruction Guilds the requisite 
cognitive entry behaviors that pennit students to achieve unit objectives. 
Achievement of objectives helps students to develop positive attitudes 
toward the particular unit and toward themselves as capable learners. 

Level and Type of Achievement 

In review, Bloom has developed a theory of school learning that 
has influenced the development of thousands of mastery learning projects 
throughout the world. School learning, according to Bloom (1980) , is 
a result of the interaction among the student *s cognitive backgroxind, 
the student's attitudes and interests, and the quality of instruction 
provided by the teacher. At each level, a student's present ability 
to learn is determined by previous learning along with the quality of 
instruction that enhanced or discouraged the learning. The anticipated 
outcome of the theory is t'.iat most individuals can learn--that is, 
achieve mastery— if given sufficient time and appropriate instruction. 
If mastery learning theory is effectively implemented into classroom 
practice. Bloom (1971) concludes that at least ?5% of all students should 
achieve objectives established by the school. Variance in school 
achievement should narrow as the model is applied successfully. 
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Mastery^ ^jearnijig in the Classroom 



TWO BASIC INSTRUCTIONAL DESIGNS 

Mastery learning strategies have been applied at every grade level 
in schools throughout the world. Block (1974) has written a descriptive 
and comparative svunmary of the two most popular approaches used in 
ajpplying mastery learning theory: Bloom's (1968) "Learning for Mastery 
(LFM) Program" and Keller's (1968) Personalized System of Instruction 
(PSI) . " Both Bloom and Keller have designed Basic plans for implementing 
mastery learning theory in the classroom. Inspection of the features 
of each of the two designs will reveal their similarities and differences. 

BloCTi's Learning for Mistery Program 

Mastery learning in the classroom* according to Bloom's LFM approach, 
involves: 

1. Well planned group-based> teacher-paced lessons aimed at 
minimizing the amount of time needed for achieving instructional objec- 
tives. 

2. Learning units, devised and sequenced by the teacher, which 
include instructional objectives requiring about 2 weeks of student 
effort. 

3. Teacher-developed, diagnostic-formative tests administered 
frequently to assess each student 'is progress. 

4; Corrective assignments, based on alternative materials and 
activities, to provide learners with opportunity to achieve objectives 
missed during the regular course of instiruction. 
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5. "Mastery" achievement at each step of a unit prior to moving 
ahead to the next portion of study. 

6. A final "stunmative" examination over all course objectives 
after all individual units in an instructional sequence have been 
taught. The criterion for mastery is set at between 80 and 90%. The 
results of this final test will determine a student's course grade. 

Keller ' s Persohalized^System of -Instruction 

Keller's PSI approach is best described as a programmed system of 

mastery learning. 

1. Course objectives are divided into learning units; not more 
than 1 week of student time is required for each unit. 

2. The teacher establishes procedures for students to follow to 
master each unit. These procedures include study hints arid guides, 
written materials, and a test of the unit material. 

3. Students are directed to proceed through the units on a self- 
paced schedule. 

4. An examination is given following completion of each unit. 

5. A sti^ent \^o fails a unit examination is required to review 
the same materials arid to retcike the examination for that unit; Review 
and retesting continue until the unit's objectives are met with 100% 
accuracy. 

6. Final course grades are determined by the number of units an 
individual has completed. A specific number of units is set as a 
prerequisite for a passing grade. 
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NOTED EFFECTS OF MASTERY LEARNING THEORY 

Over a decade has passed since schools began implementing mastery 
learning strategies. Because school systems throughout the world are 
experimenting with mastery learning programs, a great deal of informa- 
tion exists on the effectiveness of mastery lecurriing (Block, 1974; 
Block & Burns, 1977; Bloom, 1976; Kulik, Kulik, S Cohen, 1979; Torshen, 
1977) . Block (1979) notes, however, that research interests have shifted 
in the last few years from looking for evidence of the effectiveness 
of mastery learning to searching for an vmder standing of why the strategie 
work. 

Researc^o ri the C omponents- of -M^^ 

Cogiy.^iva -Achievement . Probably the most frequently cited research 
efforts on the relationship of student achievement to mastery learning 
have biseh those involving a large nisidDer of Korean school systems (Block, 
1974; Bloom r 1976; Torshen, 1977) . Two Korean educators, Kim and Lee, 
along with their colleagues/ designed experiments to implement and 
evaluate mastery learning strategies. Thouscmds of Korean students from 
rural and urlban areas were equally divided into mastery learning instruc- 
tion and traditional instruction (control) groups and were taught in 
several content areas. Results of tha Korean investigations consistently 
demonstrate the effectiveness of mastery learning research strategies 
in producing positive cognitive growth. For example, Torshen 's (1977) 
summary of the Kim-Lee efforts alluded to a study involving 5,800 seventh 
graders from several middle schools. The students received instruction 
in English and mathematics \ander either a mastery learning or a nonmastery 
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(traditional) curriculum plan for 8 weeks. The mininram criterion for 
mastery on postassessiaeht measures was 80% correct. The following re- 
sults were obtained: In English, 72% of mastery students and 28% of 
rionmastery students met the criterion; in mathematics, 61% of mastery 
students and 39% of nbnmastery students met the criterion. 

Retention of learning . Students' retention of the knowledge and 
skills learned in school has been the object of numerous research 
studies. The ability to reach a predetermined mastery score is without 
value if the knowledge is not retained and available for future appli- 
cation. Most retention studies have a foxir-part design: (a) a unit 
of material is presented to students; (b) a postassessment summative ^ 
test is a^inistered; (c) the same postassessment instrument is admin- 
istered after a time lapse of a few weeks to one or more years; and 
(d) scopes on the two identical tests are compared to note levels of 
retention. Two trends can be noted in the results. First, students 
in mastery learning programs did exhibit greater levels of retention 
than matched groups taught \inder nonmastery programs (Block, 1972; 
Romberg, Shepler, & King, 1970). Second, stud- its who performed at 
higher levels of mastery (90% or above) tended :a e:rhibit greater 
retention (Anderson, Scott, s Hutlock, 1976; BlocV., :'>72; V.oggio, 1976). 

Transfer of I,earjiing . Another area of cone*, u vThet\^r mastery 
learning approaches aid students in the transfer of learning rrcm one 
class to another or from school-related to extraciirrlcu:.c:.: situations. 
Reports of several studies stimmarized by Bloclr (1972) stiqg-«3t th^^t 
students involved in mastery learning programs from 'lindeAgarten through 
college level are successful at applying previously learned knowledge 
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to new courses of study. The notion of transfer is especially important 
to the concept of mastery learning because a student must master one 
unit prior to moving on to the next level. If mastery learning programs 
did not promote transfer of learning^ this unit-by-unit requirement 
would be unwarranted (Block, 1974) . 

Affective Characteristics . Affective characteristics of the 
learner are a major concern in the masteary learning programs based on 
Bloom's model. As in all areas of mastery learning research, results 
of studies measuring the affective domain must be viewed as tentative - 
Many instruments available for measxiring the affective domain are 
limited in scope or are appropriate only when stxidents are at certain 
developmental stages. In addition, the relationships between the affective 
domain arid school achievement are complex and controversial (Torshen, 
1977) . Because teacher characteristics have a marked influence on 
children's attitudes toward learning, teacher attitudes, as well as 
student attitudes must be considered. 

Reports by teachers involved in mastery learning projects have 
generally been positive (Barber, 1979? Hyman s Cohen, 1979? Torshen, 
1977) . Many teachers feel comfortable with the stability of the controls 
built into mastery programs (Anderson et ai,, 1976). It also appears 
that working with specific objectives and r^.iated materials increase 
teachers' confidence in th: :*.r ovri ai^iiity to teach. Teachers in mastery 
learning classrooms seem to 'jet i r.:jne:: expectations for their students 
because it is inherent in the. progrr..T>. that studants work toward a 
minimum level of competency. 
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Students enrolled in mastery learning classrooms indicate favorable 
attitudes toward covirsework and school in general (Anderson et al., 1976; 
Block, 1972; Bloom, 1976) , although less favorable attitudes were evi- 
dent when the criterion for mastery was raised to 95% and higher 
(Block, 1972) . Attitudes toward cbvirsework and school ultimately 
relate to academic achievement; students who exhibit positive feelings 
are apt to spend more time in study, and, hence, become more successful 
learners . 

Rate of Learning . Rate of learning has been defined as the time 
a student devotes to a learning task. In order- to receive the maximum 
benefit from instructional activities, however, a student heeds to be 
actively ihvolved in the task. Mastery learning programs appear to 
increase the amoiant of time students spend actively engaged in learning 
tasks (Anderson et al., 1976; Hyman S Cohen, 1979) and may, in fact, 
help students make more efficient use of their time. Because consistent, 
feedback is an integral component of mastery learning programs, students 
may experience an added incentive to complete tasks on time (Torshen, 
1977) . 

ADDITIONAL CTORENT RESEARCH SUMMARIES 

Since 1963. Hyman and Cohen (1979) have implCTtented and monitored 
Learning fo^ Mastery (LFM) programs in reading and mathematics in over 
3,000 schools. Ten pedagogical conclusions suggested by the authors 
are paraphrased below: 

1; LFM was found to be consistently more effective in the 
attainment of competencies than traditional curriculum (supported 
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extensively by Block, 1973) . 

2. The effects of LPM, rather than its effectiveness # should be 
examined. Research is heeded to investigate specific questions related 
to what causes LFM models to be successful in helping students reach 
competency requirements. 

3. Increasing "time -on- task" increases the likelihood that a 
student will achieve mastery; in fact, the best predictor of performance 
is the amount of time students spend on learning tasks - 

4. i^'M students master more instructional objectives at a faster 
rate than students in non-LFM classrooms, because objectives are care- 
fully defined and students must continually demonstrate movement toward 
mastery. 

5. Mastery can be increased through active student participation 
which is aided by: (a) carefully designed behavioral objectives which 
guide the student and teacher; (b) direct teaching of identified objec- 
tives; (c) providing immediate feedback to the student; (d) maximizing 
use of positive feediDack to instill a high self -concept; (e) minimizing 
the size of unit tasks to promote closure; (f) controlling the materials 
students use; and (g) positively reinforcing the learner's correct 
resx>onses. 

6. Individualized LFM methods are more effective than group LFM 
methods . 

7. The popular notion of competency^based instruction (CBI) may 
or may not include the LFM model. Several CBI programs are merely lists 
of objectives with accompcinying tests. 
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8. The goals of LPM are met only if students carry their skills 
out into the world and use them efficiently and effectively. 

9. LFM is learning-oriented, whereas most school programs are 
teaching-oriented . 

10. Classroom teachers can easily be trained to manage U'M 

classrooms. 

Burns (1979) carefully exafliined research reports on mastery learning 
projects that had been collected and synthesized by Block and Biurns 
(1977) and by Kulik et al. (1979). The authors had presented collections 
of research on three components of learning outcomes: cognitive achieve- 
ment, retention, and affective achievement. Biurns concluded that for 
each component, the results favored mastiry strategies over traditional 
methods of instruction. Nevertheless, the question of whetJier "mastery 
strategies work equally well for different kinds of learning and for 
different types of students" still needs to be addressed (Biurns, 1979). 
r^FM designs have been accused of promoting lower-level cognitive tasks 
while ignoring higher- level learning. The research is unclear as to 
which types of learners wilt benefit most from a mastery learning 
af^r oach . 

Although most of the publicity on LFM has been favorable, some nega- 
tive comments have also been noted. Glickman (1979) for exampler 
questions a basic tenet the t most students have nearly the same 
potential to achieve that which the schools have to teach. He notes 
that research by l^iaget, Bruner, and Elkind emphasizes the unique 
qiialities and developmental rates exhibited by individuals. Glickman 
is also concerned that children who are forced to spend excessive time 
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mastering specific unit objectives may be denied the opportunity to 
benefit from more appropriate and essential developmental tasks. 
Finally, Glickman fears that tFM models , in aiming to develop equal 
skills cunong all students^ deny a basic premise of democracy — that is, 
to encourage and promote the development of unique qualities within 
each individual. 



During the past decade, pressures from parents and the business 
community have forced educators to look carefully at what children are 
learning, how much they are learning, and the appropriateness of their 
school learning for real-life problems and needs. Reports of accounta- 
bility, minimal competencies, and competency-based education have 
flooded the popular press and professional journals. Societal demands 
on education have influenced 34 state legislatures to mandate the 
development of minima i requirements to be measured by competency testing 
programs (koenke, 1979; Rupley & Longnion^ 1978) . 

Instruction in reading has been affected by the new demands placed 
on edrrators. A suirvey of literature on reading throughout the 1970 's 
reveals aunierous references to objective-bared reading instruction and 
to th: establishment of minimal levels of rc^din^ proficiency required 
for graduation. Changes ha--^? been n:iade in injslirtT.otxcnal materials and 
objectives, teaching methods and techniques, and assessment procedures. 
The rationale undei' Lying these changes is that r^acli student be given 
the opportunity to diJV.ilop to his or her full potential in reading^ and 
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thus be able to contribute to the growth of a stronger society. Bloom's 
theory of mastery learning has provided the impetus for such experimen- 
tation in reading education. 

MASTERY LEARNItiG AND READING INSTRUCTION 

Freebery (1978) applied Bloom's theory to the development of a reading 
program in Florida. Fourteen students, scoring below grade level and 
classified as disciplinary problems, were presented with reading instruc- 
tion based on mastery learning strategies. Freebery concluded that both 
iniprovement in reading achievement and discipline occurred in part frc«n 
using the mastery learning approach. 

Blofcn (1978) designed a study to test the hypothesis that (a) reading 
cSmprehension improves with the teaching of iubskills, and (b) testing 
of siabskills would affect learning on a short-term basis. A group of 
500 tenth-grade students were divided into two experimental groups and 
5ne control group. One experimental group received instruction on reading 
subskills followed by mastery testing; a second group was tested for 
mastery of subskills but received no special instruction; arid the control 
group was riot given any special instruction or testing in subskills. 
An analysis of data gathered on a delayed posttest revealed that the 
group receiving both instruction and testing in sxabskills scored 
sigriificantly higher in reading comprehension than the other two groups. 
Although Blohm did not adhere strictly to the mastery learning approach, 
the results of his study indicate that mastery of subskills does have 
a positive effect on reading comprehension. 
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One of the most extensive applications of mastery learning theory 
to reading instruction has been the Chicago Mastery t,earning Reading 
Program (Hanhon^ 1979). Chicago's mastery learning program, initiated 
in 1975^ faced complex political, social, and financial obstacles 
(Katims^ 1979) . Among the problems were extreme ctiltural diversity of 
students^ p\:^il: teacher ratios nearihg 35:1, and a limited budget. 
After the program had been in effect for only 1 year, however. Smith 
and Wick (1976) reported five positive results of the mastery approach 
to learning: 

1. Pupil rate of learning incr-«^.sed by 30%. 

2. Higher achievers die! exhibit a decrease in learning rate. 

3. Variance among pupil iCOres decreased. 

4. Correlations betw^cjn prior ability cind performance on formative 
tests declined. 

Teacher enthusiasm was high. 
Furthermore, Katims, Smith, Steel, and Wick (1977) analyzed the results 
of the Iowa Test of Basic Skills and reported that pupils in Chicago 
receiving ttiaistery lisarning instruction in reading had greater increases 
in their scores than children in the control groups. 

In summary, mastery learning designs have been successfully allied 
to instruction in reading and could become a way of meeting public 
pressures to dononstrate student achievement. One component of the 
Chicago mastery plcui was a management isystem used by teachers and 
administrators to monitor read-ng skills development. Objective -based 
management systems have become increasingly widespread; examination of 
these systems reveals that the philosophy underlying their develop:.nent 
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is very similar to that of mastery learning theory. 

OBJECTIVE-BASED BEADING INSTRUCTION 

Objective-based reading programs aie generally intended to supple- 
ment basic instruction. Usually the systems include (a) an identifica- 
tion of subski3.1s essential for cor^etence in reading, (b) a listing 
,,6f objectives that must be taught and measured, (c) criterion-referenced 
tests designed to assess students' skill development, (d) sources of 
materials to use in teaching skill lessons, and (e) techniques for 
recording progress in skill development (Stallaj.d, 1977b) . The assump- 
tion behind these management programs is that reading is a measurable 
entity and mastery of individual subskills will contribute to overall 
reading achievement. 

The use of management systems nearly parallels the mastery learning 
strategies discussed earlier. First, students are given tests to deter- 
mine specific reading skill needs; next, prescriptive teaching r. xd 
learning dcciars; and finally, posttests are given to determine the 
efficacy of the teaching and learning. Children who have mastered a 
set of objectives move ahead; those who fail to achieve mastery are 
given additional time and corrective instruction. Mastery levels on 
criterion-referenced tests > usually established by the publisher, tend 
to center at about 80%. (See Stallard, 1977a, for an analysis of 15 
widely used mctnagement programs.) 

Objective-based reading instructldh has been a controversial issue 
among reading educators. Proponents (Duffy & Sherman, 1977; Otto & 
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Chester > 1976; Samuels* 1976) point to the strengths of this educational 
innovation in terms of measuring individual skill development, focusing 
instructional time, and recording and reporting student progress. 
Opponents (e-g., Bagford^ 1977) question the validity of the tests 
used to determine skill mastery, the feasibility of identifying a 
hierarchical arrangement of reading subskillsr cind whether reading can 
be segmented into a myriad of subskills. 

I affy (1978) acknowledges the shortcomings of objective-based 
«nsc/ action when carried to on extreme, but argues that it can be very 
effective when used in moderation: 

Objective-based instruction is in essence a istrategy for 
organizing the nuts arid bolts of the reading curriculum 
into manag€>able systems u;>3ful to teachers and pupils. 
Carried to extrci.«s or applied inappropriately, it can lead 
to disaster since reading is too complex to be completely 
captured in a set of skill objectives arid teaching is an 
art requiring more than mere testing arid teaching of skills. 
However, wheri applied flexibly and with a sense of balcuice, 

objective-based instruction can be a useful tool in the / 
teachers repertoire (p, 522) . 

MINIMAL COMPETENCIES IN READING 

The era of accountability, with its deitiarids for coricrete evidence 
of mastery, has been iristrumental iri establishing and testing minimal 
competencieis. Well over half of the states require minimal skills or 
"basic literacy" as a requirement for high school gradiiation. 
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The issue of minimal competency testing has been met with much 
emotion by reading educators. In April 1979, the International Reading 
Rssbciatipn warned that a single assessment of minimal competency 
should never be used to determine student promotion or graduation. In 
the IPA Board of Directors* recommendation, a statement was issued 
recommending thai instead of a single instrument, decisions be made 
using a variety of diagnostic tools and that efforts be made to rStiedy 
deficiencies based on the diagnosis. 

Seymour (1979) , taking issue with the IRA publication Minimal 
Competency Standards: Three Points^ of View (Goodman, Farr, & Cassidy, 
1978) proposes that schools should be held accountable for skill 
development, and that government mandates should pressure schools into 
producing more capable graduates. Seymour attributes student failxire 
to the absence of standards -and to the lack of student concern in the 
schooii- He calls on all those concerned with education to "devise 
and institute minimum standards that will help raise the competence of 
students to levels indicating mastery of the basic skills" (p. 220). 

McDonald (1978) ,. another proponent of minimum competencies testing, 
expresses concern about the influence that paiint groups, school boards, 
and state legislatures have had in establishing minimal competency 
standards. He cautions reading specialists to take the leadership 
roles in directing the development of those competencies. 

Tierney (1978) , an opponent of mandating minimtmi requirements and 
of the accompanying c<OTipetency testing programs, expresses concern that 
the minimum may becme «ie maximtmi requirement of students. He b ieves 
that too many unanswered questions exist regarding testing, student 



retention, and the improvement of basic skills. Farr and Roser (1974) 
also question the rationale behind the growth of interest in testing 
and suggest that tests are given to find out how well the educators have 
done their jobs. Cassidy (±978) , cinother opponent of establishing 
mintnai competency standards , addresses the issues raised by Purves 
(1976) : (a) the dotabtfui validity and reliability of the tests used 
to assess reading competencies; (b) the fear that teachers will teach 
to the tests; and (c) a lack of concern for cultural and language 
diversity. Each of these issues reflects the belief that minimal 
competency teaching and testing require careful examination by educators. 

Demands by pressxire groups throughout society have forced educators 
to face the accountability issue. Responses have taken the form of 
mastery learning strategies, objective -based reading programs, and the 
establishment of minimal competency standards. All three responses 
seem to have similar roots: Identify what needs to be learned and 
establish related objectives, develop a way to instruct and assess 
children on the objectives, arid provide feedback and corrective 
assistance so as many students as possible reach an established 
standard of achievement. 



The intent of this final section oh mastery learning is to discuss 
measurement and evaluation. A basic question is, "How do we determine 
when a child has truly mastered a skill?" Information relevant to this 
question has been divided into three categories: (a) determining the 
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ajpprojpriate type of aissessment instrument; (b) developing adequate testing 
devices; and (c) deriving, interpreting, and using test results. 

DETERMININS THE APPROPRIATE TYPE OF ASSESSKffil^T INSTRUMENT 

In order to make sound instructional decisions, it is necessary 
to determine whether a child has mastered a particular skill- Infor- 
mation on mastery or nonmastery provides instructional cues to the 
teacher and the student--that is, mastery indicates that the student 
should progress to the next unit of study; nonmastery indicates that 
additional time and corrective instruction are needed. Accurate infor- 
mation regarding a student's proficiency on specific subskills is 
generally obtained from performance data on an stppropriate assessment 
instrument. Although both formal and informal assessment measures pro- 
vide the teacher with valuable data, the present discussion will address 
fortftal means of assessment only— specifically, norm--ref erenced and 
criterion-referenced tests. 

Norm-referenced Testg 

According to Pbpham (1978) , "a norm-referenced test is designed to 
ascertain ah examinee's status in relation to the performance of a group 
of other examinees who have completed that test" (p. 24). In other words, 
a norm-referenced test permits the examination of an individual's test 
score in relation to the scores of his or her peers. 

Norm-referenced tests (NRT) provide helpful information to educators 
who need survey information or comparative growth data. Items selected 
for inclusion in a norm-referenced test are usually of average difficulty 



level. This results in a large spread of scores, or high response var- 
iance, that is essential when comparisons are to be made (Farr s Roser, 
1974) • Norm-referenced tests are especially useful for ranking stuc'ents 
in terms of aptitude^ predicting students' potential, and making com- 
parisons between groups or individuals. 

Definitibn and Uses of Criterion-referenced Tests (CRT) 

Popham (1978) states that "a criterion-referenced test is used to 
ascertain an individual's status with respect to a well-defined 
behavioral domain" (p. 93) . Criterion-referenced testing is based on 
"the notion of a continuum of knowledge acquisition raiiging from no 
proficiency at all to perfect performance" (Glaser# 1963, p, 519) • An 
individual's score on a criterion-referenced test is an indication of 
ability at a particular point in time on a specific unit of material, 
A score on a criterion-referenced test measures a student's ability 
(i.e., "How much does this child know?") in relation to a set of objec- 
tives. Results of a criteribh-referehced test can be used to classify 
examinees as "masters" or "honihasters" of an objective in order to plan 
the nisxt step of instruction (Berk, 1980) 

Th43 Development of C rifeer^on-r^erenGed ^sta . Several essential 
features must be included iii the development of a criterion-referenced 
test. First, the test mantaal should contain detailed descriptions of 
the purported objectives of the test, in addition to helping teachers 
understand v^at the test intends to measure, this list of specific 
objectives is more apt to result in good item development. Because it 
is not possible to include items which reflect every aspect of an 
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objective, a writer of a CRT must choose bniy those types of skills, 
Horn a i»ool of alternatives, that best reflect mastery of an objective, 
once the appropriate skills have been designated, other test specifica- 
tions can be prepared: (a) general descriptions of the behaviors to 
be measured; (b) seumple items; (c) a list of stimulus attributes 
(characteristics of the "stems" or stimuli used to lead children to 
selecting an answer); (d) a list of response attributes; and (e) speci- 
fications for any supplementary materials. The validity and usefulness 
of the resulting CRT will depend on how carefully the test specifications 
have been considered. Second, the items selected for inclusion should 
be representative of the entire domain of behaviors to be measured 
(Berk, 1980) . The items must reflect their respective objectives and 
discriminate between groups of masters and nonmasters. 

Appropriate test length is difficult to determine. If too few items 
are included, test results may be tmreliable; on the other hand, too 
many items render the instrument cumbersome and inefficient. Harableton, 
Swaminathan, Algina, and Coulson (1978) point out that the number of 
test items used to measure each objective will reflect the usefulness 
of- the test score, if the nunnber of items is insufficient, the decision 
leading to the mastery or nonmastery classification will be inconsistent 
in test-retest situations. Lengthening the tes-L will lower the chance 
of miscalculating a student's status; however, lengthening the test 
means decreasing instructional time and the tradeoff may not be worth- 
while . 

In contrast to the norm-referenced test, in which the selection 
of it^s is designed to yield a high response variance providing a spread 
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of scores y criterion-referenced measures are r..^. designed tc ^Jit 
wide variance in perf brmahce . Techniques for deteimininy D.li.^ullity , 
however r are available despite the limited varirince. >v . '^J .l";^jy is 
bfteii established by test-r^^test score constancy techniques. Iho issue 
of validity must also be considered in the development of cri^.r: ion- 
referenced tests- Pdpham (1978) suggests three validation strategies: 
submitting test items with lists of test specifications co a panel of 
experts r checking the oucc- uiiS with test predictions of success, and 
systematically evaluating all possible domains which might affect 
mastery of the objectives. 

THE ISSUE OF PERFORMANCE STANDARDS 

In the mastary testing arena, it is expected that a standard for 
performance be established. Such a standard^ often referred to as a 
mastery level, a cutting scores or a minimum pass level, is used by 
teachers to determine a child's success or failure on a unit or set of 
objectives- Although the establishment of performance standards has 
been a part of educational evaluation for over 20 years (Torshen, 1977) , 
at the time of this writing no empirically based guidelines for estab- 
lishing mastery performance standards exist (Berk, 1980; Block, 1974; 
Bloom, 1976; Popham, 1978; Popham & Baker, 1970; Terwilliger, 1979; 
Torshen, 1977). 

After using a particular testing instrument with several groupis 
of learners, experienced teachers will intuitively know what scores are 
necessary for mastery; some flexibility in setting acceptable performance 
standards is permissible. Establishment of minimal levels of competency 
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has. depended on human judgment. As Popham (1978) noted, it is not as 
±f tiiere were a "true and defiiti^ive minimal proficiency level lurking 
out there if we [were] only clever enough to ferret it but. Minimal 
performance levels will always be judgmentatiy based, and hence, subject 
to the frailties of human judgment" (p. 167). Popham points out, however 
that pi^formance standards must not be arbitrary oi "off the wall," 
bat, instead, should "rely on recent collateral data, wide-ranging input 
from concerned parties* and systematic efforts to make sense out of 
relevant perfonriemce and judgmental data" (p. 169) . 

Fortunately, progress has been made in establishing performance 
standards, although the issue of setting mastery scores using systematic 
proceduiei is far from resolvid. Both the continuum and state models 
sonSaSized and discussed by Meskauskas (1976) have generated research 
designed to develop quantitative models of standards setting. Contiftuuro 
models are based on a belief that each learned is at some point along 
a path of knowledge acquisition and that a student's score is art indicator 
of his or her present level of learning. State models , on the other 
hand, describe mastery stat-js in definite terms. In state modals, t:_ere 
is no room for "partial mastery" because mastery is defined as complete 
knowledge. But, in both continuum and state models, human judgment re- 
mains a factor in establishing standards for mastery. 

Hambleton et al. (1978) have discussed the issue of determining 
mastery states. These authors criticize the practice of comparing an 
examinee's "domain Score" to an established cutoff score on a criterion- 
referenced test, thus classifying the exairdnee as a "master" or "nonmaster. 
They (1978) suggest classifying examinees as masters, partial masters, 

So 
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of noninasters, Swaminathan, Block, and Ravitch (1972) proposed that 
categories of cutoff scores be established and that pupils be assigned 
to instinictional settings appropriate to their performance, Kriewall's 
(1972) model also has students categorized into groups along the mastery 
continuum in what he terms "proficiency distributions-" 

Huynh (1976) applied a decision-theory framework in ah attempt to 
assign examinees to mastery status, Huynh 's work also concerns Alpha 
and Beta measurement errors based on the use of domain scores. At 
present the studies of Huynh (1976) and Hambleton et al. (1978) are in 
a developmental stage and must be investigated at length before putting 
them to practical use, 

Millmah (1973) proposed two procedures for detentiinihg cutoff scores. 
First, he suggested the cutoff score be set at the point where a pre- 
determined percentage of a given group of students would pass (or be 
considered masters) , This procedure has been rejected because it defied 
one of the basic tenets of mastery testing — that an individual be evaluated 
in : of his or her personal performance on a set of questions re- 
flet .xng specific objectives. Miilman's second suggestion was to develop 
a criterion-referenced test for collecting a set of scores from students 
who had already mastered a group of identified objectives. A raw score 
iselected from this group would then be established as the "cutting score" 
for the test. But Miliman also believes that mastery levels do not 
have to risnain as aissolutes. Higher cutoff scores may be feasible for 
fundamental skills; on the other hand, the establishment of mastery 
levels may not be warranted for nonessential skills. Adjusting or 
lowering cutoff scores is suggested when remediation costs become 
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prohibitive or the jpsychoiogical effects of mastery learning become 
taxing for the individual (Terwilliger , 1970) . 

Block U972) designed a Jtudy to -sxamine the effect of varying 
5utoff scores diirihg the course of instruction. Students who were 
expected to achieve higher cutoff scores performed better on achievement 
measures, transfer of learning, and retention of knowledge and skills, 
in addition to these academic improvements, affective behaviors showed 
an increase until the minimum pass level approached 85%. Results from 
the Block study suggest that varying cutoff scores influences performance 
and attitudes. Hence, it may be necessary for teachers to alter expec- 
tations for particular groups of children. 

The establishment of performance standards is in^rtarit to student 
achievement (Block, 1972, 1973). imposing standards provides students 
with motivation and results in increased achievement scores, especially 
for those students with inconsistent study habits. Torshen (1977), 
however, warns against excessive enforcement of inappropriate standards, 
imposing difficult or impossible standards could prohibit slow students 
(perpetual nonmasters) from adequate exposure to sc! learning. 
According to Torshen, when a student fails to achiev.. mastery even when 
feedback and corrective procedures have been offered, teachers should: 
(a) examine the objectives carefully to determine if mastery is essential 
to the student's future learning; (b) consider an alternative means of 
instruction that does not require mastery of the objective; or (c) move 
the student ahead to a new area of work while continuing to offer 
assistance in the area of difficulty. 
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Terwilliger (1970) raises an interesting issue regarding criterion 
scores in mastery learning. If a mastery level of 80% is the performance 
standard on a 10-item test, there are 56 ways of obtaining mastery 
(8 or more items correct). As the number of test items increases, the 
possible combinations of items that could equal mastery increases 
drastically. Maintaining the criterion of 80%, but increasing the 
nxamber of test items to 20 would yield 6^196 possible response combina- 
tions; raising the number of test items to 30 would yield 768^212 
combinations (Terwilliger, 1970) . Thus^ the ambiguity of test results 
increases with test length. 

Terwilliger (1979) has suggested thiat teachers adopt a coit^roihise 
plan that involves eithel an adjustment of the mastery criterion level, 
or a lowering of the quota of students e5cpected to reach the set 
standard. By making these adjustments, Terwilliger maintains that the 
positive aspects of the ntastery model can still be retained iii clasisroom 
instruction. For example, when examining course objectives, teachers 
should determine which objectives all students should master and which 
are advanced, complex objectives that only a portion of the students 
ccin be expected to master. 

In sxjmmary, it appears t^at mastery testing is best served by 
criterion- 'eferenced measures. Those measures, however, must be carefully 
developed according to explicit test specifications based on the stated 
objectives of a learning unit. The establishment of performance 
standards remains an unresolved issue at this time, although guidelines 
are available in the literature. Subjective opinions of educators re- 
main most popular in determining mastery levels (Pbpham, 1978; Meskauskas 
& Webster, 1975; Levine & Forman, 197?) . 
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FINAL EVALUATION OF THE WORD IDENTIFICATION 
TEST BATraRY 

In spring 1980, the final version of the Word Identification Test 
battery and the Reading Svibtest of the Metropolitan Achievement Tests 
were administered to approximately 100 children in each grade level, one 
through five. The primary purpose of the study was to examine relation- 
ships between word identification skills, as measured by the various 
siibuests in the battery, and reading comprehension, as measured by the 
standardized test of reading comprehension. Performance guidelines for 
each of the five subskills assessed in the Wbrd Identification Test 
battery were deter .ed from the resxitts. 

Method 

SUBJECTS 

A total of 644 pupils in grades one through five participated in 
the study. The pupils were from three schools iii the Madison Metro- 
politan Public Schools, three schools in the Middleton School District, 
and one school in Antioch, Illinois (see Table 1) . Middleton is a 
suburb adjacent to Madison, Wisconsin. Both Madison and Middleton are 
communities with a middle to upper middle class socioeconomic population. 
Antioch, located in the northern part of Illinois, close to the Wisconsin 
border, has a mix-?! socioeconomic population. Many residents of Antioch 
commute to the Chi.cago eurea or are employed in local industries. 
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Table 1 



Subject Population by School and Grade 



Grade 






School districts 








Madison , 


Wis. 


Middle to: 


Wis • 


Antioch, 111. 






n O 


r 


A 


B 


c 






1 






34 


43 


23 




100 


2 






30 


60 


17 




107 


3 (Structure 
Subtests 
Only) 


49 


44 










93 


3 (Phonics 
Subtests 
Only) 


57 




35 






22 


114 


4 


43 




39 






29 


111 


5 






49 


70 






119 


Totals 


92 57 


44 


187 


173 


40 


51 


644 



ERIC 
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in each school, testing was condar iii intact classroom groupings • 
All classrooms were heterogeneous in academic ability, with pupils repre^ 
senting a typical rai^ge of reading skills at each grade level. 

STIMULI 

The Word identification Test battery consists of five subtests within 
two major skill areas, phonics and structural euialysis. Two of the five 
subtests. Consonants and Vowels, assess phonics skills; the remainiiig 
three subtests. Inflected Endings, Affixes, and Contractions & Possessives, 
assess structural analysis skills, ftii five subtests are group administered 
paper and pencil tests. A discussion of the criteria guiding test develop- 
ment, as well a detailed description of each subtest, is presented in 

Interim Report; The Refinement of the Test^3a t tery^- Assess Word 
Identification g r ills (Johnson, Pittelman, Schwenker, & Shriberg, 1980) , 
The-Asses sment of Structural Analysis Skills (Johnson, Pittelman, Schwenker, 
& Shriberg, 1979) , and A New Approach to the Ass essm ent^of^^ho&ics-^kills 
(Johnson, Shriberg, Pittelman: & Schwenker, 1979) . 

Phonics Subtests 

Consonants Subtest . The Consonants Subtest assesses 45 spelling-to- 
sound correspondences for single-letter consonants, consonant clusters, 
and consonant digraphs. Target sounds were selected according to their 
frequencies of occurrence in the Venezky (Note 1) tabulations of the 
20,000 most common English words. All single^letter consonant and two- 
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letter consonant digraph correspondences with frequencies exceeding 150, 
and all two-letter consonant cluster correspondences with frequencies 
exceeding 110, were selected for inclusion in the Consonants Subtest, 
Table 2 presents a list of the correspondences assessed > the frequency 
of occurrence for each correspondence, and the position (s), initial and/ 
or final r in which each correspondence is eissessed. 

The Consonants Subtest is comprised of 90 items ^ two items assess- 
ing each of the 45 correspondences. The subtest is designed to allow 
teachers to assess children's performance by category (single -letter 
consonants, consonant clusters^ and consonant digraphs) rather than by 
individual correspondences. For exan^le, a student who made errors on 
correspondences represented by cl*, -ht, -mp, gr-, and str-, would be 
viewed as having difficulty with consonant clusters in general, rather 
than with these five cbrrespdridences in particular. 

Each item oh the Consonants Stibtest consists of a target synthetic 
word (with the target letter (s) underlined) and four response choices in 
picturs form. The target synthetic t: -ird is a one-syllable word which 
conforms to regular phonological rul^-j of the English language. 

The response choices for each item are pictures whose names are well 
taown to elementary school children. For each target item, the four 
response choice categories are: a Correct response choice, an Acoustically 
Close response choicu, a Visually Close response choice, and a " Neither" 
(neither acoustically close nor visucdly close) response choice. For 
the six items assessing the three single-letter r-onsonants having other 
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Table 2 





Target Correspondences by frequency and 




Position for Consonant is 




Example Total frequency Position assessed 




Initial Final 


Single-letter consoncuits 



»j 


bat 


1,445 


XX 




r* ? a G /ir / *i 

V»S / K/ ) 


cup 


2,433 


XX 






cent 


719 


XX 




u 


dog 


2,897 


X 


X 


f_ 


fox 


1,064 


X 


X 


£ (as /g/) 


goat 


722 


X 


X 


g_ (as /j/ with 
silent e) 


cage 


5-: > 




XX 




house 


764 


XX 




2 


^9 


214 


XX 






kite 


3S5 


X 


X 


1 


4amp 


3,67^ 


XX 






mice 


2,711 


X 


X 


h 


sun- 


4,599 




XX 


£ 


pet 


1,811 


X 


X 


q (followed by u) 


queen 


192 


XX 




r 


rake 


971 


XX 




s (as /s/) 


sink 


2,171 


X 


X 


£ (as /z/) 


boys 


612 




XX 




tag 


4,040 


X 


X 


V (with silent e 
in final position) 


vest 


1,534 


X 


X 


w 


"witch 


442 


XX 
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Table 2 (continued) 





Example 


Total frequency 


Position 


assessed 









Initial 


Final 





Consonant 


clusters 


_ _ 





stop + licjuid 










b£ 


broom 


232 


XX 




cl 


clip 


184 


XX 




cr 


erown 


241 


XX 






drum 


136 


XX 




gr 


grapes 


275 


XX 




£i 


plug 


175 


XX 




EE 


prince 


549 


XX 




tr 


train 


401 


XX 




liquid + stop 












belt 


155 






fricative + liquid 










fx 


I|±ag 


160 


XX 




fr 


frog 


125 


XX 




si 


sled 


114 


XX 














mp 


lamp 


274 




XX 


nd 


hand 


626 




XX 


nt 


tent 


1,304 




XX 


nasal fricative 










nc (followed by 
silent ^) 


fence 


506 




XX 
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Table 2 (continued) 





IS'VSTTml A n^^^^t ^ A«H /^a.r 


Position 


assessed 






Initial 


Final 




Consonari't' citis'l'^irs 




- — - 


fr±cat±s?e + stop 








se 


scale 162 


XX 






spool 325 


XX 




St 


stang? 1 , 054 


X 


X 




r^on Sonant Hi trT'^itiH^ 






ch 


chair 270 


XX 




5£ 


swing 401 




XX 


B 


phone 188 


X 


X 




ship 427 


X 


X 


th (voiceless) 


wreath 3?'^ 


X 


X 
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coininon sound corresponds- r'.:?-s — C/ Qj and the Acoustically Close cate- 



word siientiy to themselvc-c f/ic determine the- .ound represented by 
the underlined letter ( s) - Iv'er^, they are told to listen as the examiner 
reads the names of the foiir picture response choices for that item. 
Students are then instructed to draw a circle around the picture whoise 
name began (or ended) with the sound represented by the underlined letter (s 
in the target synthetic word. Figure 1 is a copy of the directions and 
practice items for consonant correspondences in initial position from 
the Consonants Subtest. 

Vowels Svibtest . The Vowels Subtest aissesses 29 different speliing- 
to-sound correspondences. Sounds selected for testing include the 5 short 
and 5 long v wels, 4 "other" frequently occurring singie-ietter vowels 
(i.e., single-letter vowels that corresponded to somds that are neither 
short nor long) , arid 14 two-letter vowel clusters. Target vowels sounds 
Were selected according to their frequencies of occurrence in the Venezky 
(Note 1) corpus of the 20,000 most conSon English words. The frequency 
ranges for the vowels selected for inclusion in the test, by category > 
are: short vowels = 1,458 to 7,554; long vowels == 508 to 1^870; "other" 
sirgie-ietter vowels = 117 to 243; and two-letter vowel clusters = 123 to 

e ____ __ " _ _ 

723. A list of the vowels assessed^ as well as of thexr frequency of 

^The frequency value of 723 t-epresents-the s\jmmed_frequencies of 333 
for or (as in porch) and 390 for or- (as in corn) . ftithou^ Venezky dif- 
fereiitiateis between the two correspondences, they are treated as one cor- 
respondence in the present study. 



gory was changed to * Gt^:;-' Couunon sound Correspondence. 



When taking tiie st^ 



are cirected to read the synthetic 
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PHONICS : Consonants 
Initial Position 

In each row, look at the made-up word. Notice that there is a letter 
or letters underlined at the beginhlht^ of that word. Read the word 
to yourse? f and decide how the underlined part sounds . Then listen 
carefully as I read the names of the pictures in the row and decide 
which picture name beg4ns with the sound of the underlined letter 
or letters in the made-up word. Draw a circle around that picture. 



A. 



B. 



C. 




Figure 1. Directionr emd practice iteons from the first page 
of the consonants Subtest for Initial Position. 
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occurrence and position (s) in words (median and/or f inai) , is presented 
in Tcd3les 3 and 4. 

The Vowels Subtest is comprised of 56 items, two items assessing each 
of the 29 corresfXDndences. As with the Consonants Subtest, it was in- 
tended that children's performance by analyzed by cateqory (in terms of 
short vowels, long vdwelis, "other" single-letter voifris, and two-letter 
vowel clusters), rather than by individual corresi::>ndences. For example, 
a student who made errofjq .->n correspondences represented by ou, ai, oo, 
and ow, would be viewe:: ir^ving difficulty with vowel clusters in 
general, rather than Wi "i: ; 3se four correspondences in particular. 

The format of the Vowels Subtest is the same as the format of the 
Consonant J Subtest; each item on the Vowels Subtest consists of a target 
synthetic word (with the target letter (s) underlined) and four response 
choices in picture form. The target synthetic word is a one-syllable 
CVC or CCVC word which was constructed to conform to regular phonological 
rules of the Eniish language. 

The response choices for each item are pictures whose names are well 
known to elementary school children. For each target item, the four re- 
sponse choice categories are: a Correct choice, an acous t i cal l y -Clbse 
response choice > a Visually Close response choice, and a "l lelther" 
(neither acoustically close nor visually close) response choice. 

As with the Consonants Subtest, students are fir-t directed to 
read the synthetic word silently to themselves and to determine the 
sound represented by the underlined letter (s) . Next, they are told to 
listen as the examiner reads the names of the four picture response 
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Table 3 

Target Correspondences by -^^uehcy and Position for 
Short, Long, and Otl. .-: ^^xe-letter Vowels 



Example 



Total frequency 



Position assessed- 



Final 



Short vowels 



i- 
6 



hat 

dre^ss 

fish 

mop 

drum 



2,121 
3.241 
7^554 
1,590 
1,458 



XX 
XX 
XX 
XX 
XX 



Long vowels 



b 



r^e 
ittete 
hive 
rope 
flttte 



1,870 
503 
968 

1,292 
9S7 



XX 
XX 
XX 
XX 
XX 



other single- 
letter vowels 



a 

O- 

6 



^ (sometimes 
followed by 
silent e) 



ball 

glove 

dog 

fl£ 



147 
159 
117 

243 



XX 
XX 

XX 



XX 



^(Thile the sound of ^in juse- (the diphthong or iu) has a greater 
frequency than the sound of u in f l ute (the simple vowel u) ^ only the 
latter letter-sound correspondence was used in the Vowels'Subtest. This 
is because there are few one-syi table picturable words wiUi u as in fuse 
that are well-known to elementary school pupils. Except for when the 
sound cbrresponc'^nce for the letter u occurs in initial position > the two 
sound correspondences above for u are considered to be very similar. 
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Table 4 



Target Correspondences by Frequency and 
Position for Vowel Clusters 



Example Total frequency Position assessed 



Medial Final 



Vowel clusters 





train 


261 


XX 


ag 


taut 


175 


XX 


- a 
ar 


barn 


532 


XX 


ay 


play 


142 




ea 


seal 


320 


3CX 


ea 


bre^d 


135 




ee 


feet 


294 


:^x 


a 

er 


fe^rri 


387 


XX 




moon 


198 


XX 




porch 


723 


XX 


ou 


cloud 


238 


XX 


QW 


gown 


123 


XX 




snow 


13b 


X 




purse 


204 


XX 



3bc 



X 



^In this subtest, all vowel 4- r cdinbinatidns are treated as vowel 
clusters. The authors are aware, however, that eg and ur are simple 
vowels, whereas ar and or are vowel- + r combinations. 



See footnote 5 in this chapter. 



83 



choiceis for t±iat item. Students are then instructed to draw a circle 
around the jicture whose naiae contains the same medial (or final) vowel 
sound as tl'ie sound corresponding to the underlined letter (s) in the 
target synthetic word. Figure 2 is a copy of the directions and practice 
items from the Medial Vowels section of the Vowelis Subtest. 

Structure Subtests 

Inflected Endings Subtest s The Inflected Endings Subtest assesses 
seven different target inflected Endings. The selection of inflected 
endings was based oh f errancy /nformatioh from the scope and sequence 
charts of selected be .eading serier>, and on published tests for in- 
flected endings. 

The Inflected Endings Subtest is comprised of 39 items with three 
to six items assessing each inflected ending. A primar^>' factor in deter- 
mining the numJDer of test items for assessing any particular inflected 
ending, was the frequency of occurrence in the langt:^re for that inflected 
ending. Frequency information was gathered from the Ginn Lexicon Project 
Frequency Listing (Johnson & Baumann, 1979) (see footnote i) , ; xd the 
Ameri can Heritage Word Ffe q> :^ncy Book (Carroii et ai., 1971) ee foot- 
note 2). The target inflected endings assessed in the subtest , as well 
as tl'ie number of items for each target inflected ending, are presented 
in Table 5. 

The selection of the root words to >■ as also based on 

frequency informatioxi from the. Ginn Lexiccu - requency Listing 
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PHONleS: Vdwels 
Medial Position 



In each row, look at the made-up word. Notice that there is a letter 
or letters underlined in the middle of that Word. Read the word to 
yourself and decide how that letter or letters sound . Then listen 
carefully as I read the names of the pictures in the row and decide 
which picture name contains the sound of the underlined letter or 
letters in the made-up word. Draw a circle around that picture. 
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Table 5 



Target Inflected Endings Assessed 
in the Inflected Endings Siibtest 



Target inflected ending Number of items 

s (plural) 5 

(e) s (verb) 6 

ed 6 

ing 3 

er 3 

est 3 

y 2 

Other target words 

tense marker (with vowel change) 1 

root (correct response) io 

Total Number of Items 39 
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and the fimarican Heritage Word Frequency Book . After the root words 
were identified, their familiarity to young children was checked in Th^ 
Inlying Word -Vacafcuiary (Dale & O'Rourke, 1976) (see footnote 4) 

Each item on the Inflected Endings Subtest consists of a sentence 
with a word missing and four response chioces beneath the sentence. Two 
of the four response choices are the root word with inflected endings; 
a third response choice is the root word alone; and the fourth response 
choice is the phrase "none of these." 

Students are directed to read each sentence silently to themselves 
and to select the response choice that best completes the sentence. 
After circling a response choice, the children are to continue on to the 
next item, and so forth, until all the test items are completed. Figure 3 
is a copy of the directions and practice items from the Inflected Endings 
Subtest. 

Contractions & Possessives-Subtest. fflie Contractions & Possessives 
subtest assesses nine different forms of contracted words and the apos- 
trophe s. The selection of contractions to be assessed was based on a 
review of basal materials, the Wisconsin Design for Reading Skill Develop- 
ment, which is a skills management syste:^i, and on frequency information 
from the American Jieritage Word Frequency Book . Commonly taught contrac- 
tions were identified and grouped into categories according to which 
member of the word pair was contracted. For example^ contractions of 
will , such as 1*11 , we*ll , he'll , and they'll formed one category. Based 
on frequency tabulations of contracted forms within these categories, the 
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inflected Endings 



Each of the sentences below has a word missing. Read each sentence to 
yourself. Then carefully read each of the words below the sentence. 
Draw a circle around the word that best completes the sentence. In 
some cases, "none of these" may be the best answer because the cor- 
rect word is not given. 

















A. 


Her piece of 
big 


cake is 
biggest 


than mine, 
bigger 


none 


of 


these 


' B. 


Doctors were 




at the hospital . 










need 


needed 


needing 


none 


of 


these 


C. 


The puppy 
3 ump 


out 

3 umper 


of the box. 
jumping 


none 


of 


these 



Figure 3. Directions and practice items from the first page of the 
Inflected Endings Sxabtest. 
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categories were then rank -ordered arid a proportionate number (see Table 6) 
of specific contracted forms from each category were selected for assess-- 
ment • 

The Contractions & Posse ssives Subtest is comprised of 31 items, 21 
items aissessiilg contractions and 10 items assessing possessives. As in 
the other Structure subtests, the nuinber of items for assessing each con- 
tracted form was based on frequency information. The target poissessive 
and contracted forms assessed in the siibtest, as well as the number of 
items for each form, are presented in Table 7. 



Each item on the Contractions & Possessives Subtest consists of a 
sentence which contains an underlined contracted or possessive word and 
four response choices. The response choices for ail the items which 
assess a contracted form not ending in apostrophe s^ consist of (1) the 
correct response choice; (2) and (3) two response choices consisting of 
two -word combinations which could make sense in the context of the sen- 
tence but which do not correspond to the contracted form; and (4) the 
phrase none of these , The formation of response choices for target 
words ending in apostrophe s^ was the same^ regsurdless of whether the 
apostrophe s^ represented a possessive or a contraction. The four response 
choices consisted of (1) the word possessive; (2) and (3) two response 
choices consisting of one word of the two word phrase which comprised 
the target contracted form; and (4) the phrase none of these . The only 
exception in the format of the response choices was when none of these 
was the correct answer. 




Tabie 6 

Frequency of Contractions^ and Rank Order 
of Contraction Categories 



Contraction 



Contraction 
frequehcy 



Rank order 
by category 



n*t 

aren ' t 
doesn't 
hasn * t 
shouldn't 
weren' t 
won' t 

it's 
here ' s 
what ' s 

'11 __ 
it'll _ 
they ' 11 
you' 11 

' ve 

they ' ve 
you ' ve 

'd (had) 
I'd 
we 'd 

'd (would) 
she 'd 
it'd 

I'm 

's (us) 
let's 

're 



you ' re 



239 
590 

98 

SO 

He 

756 

2,178 
118 
482 

65 
120 
524 

53 
317 

534 
not listed 

130 
±1 



1,848 
892 
848 



6 

7 
8 
9 



^ased on information from the aroer±can Heritage Word Fre- 
quency Book, 
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Table 7 

Target Forms in Contractions & 
Possessives Subtest 

Target forms Occurrences 

n't (not) 6 

•11 (will) 3 

•d (would) 2 

•ve (have) 2 

•d (h^d) 2 

•ih (am) i 

•s (us) i 

•re (are) 1 

's (is) 3 

"s (possessives) 10 

Total 31 
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students are directed to read each sentence silently arid to select 
the response choice (provided beneath the sentence) that best completes 
the sentence. After circling a response choice, the children are to con- 
tinue on to the next item, and so forth, until all the test items are 
con^ieted. Figure 4 is a copy of the directions and practice items from 
the Contractions & Posses sives Subtest, 

Affixes Subtest . The Affixes Stibtest assesses children's knowledge 
of 18 target affixes, 8 prefixes, and 10 suffixes. The selection of 
affixes for assessment on the Subtest was based on frequency information 
gathered from scope and sequence charts of basal reading series, published 
tests of affixes, and the SWRL Lexicon (see footnote 3) . ^e Affixes 
Svibtest has a total of 54 items with three items assessing each target 
affix. A list of the 18 target affixes assessed in the subtest is pre- 
sented in Table 8. 

The selection of the root words to be combdLned with the target 
affixiss was based on two criteria: (a) a root word had to combine with 
at least two other affixes so that foils could be created for the test 
items; and (b) the familiarity of the affixed root word to second, third, 
and fourth grade pupils had to be as consistent as possible both within 
and across test items, ^e Ijiyijag-^ord yooabulary was the source used 
to determine the grade-level familiarity of the affixed words. 

Each item of the subtest consists of a one or two line proise des- 
cription of the affixed word and four response choices. In order to 
draw children's attention to the target root word, the root word is also 
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Cohtractiohs and Possessives 



Read each sehtehce below carefully. Then decide which meaning the 
apostrophe mark (') has in the underlined word. Circle the choice given 
below the sentence that tells the meaning d£ the underlined word. In 
some cases, '•hone of these" may be the correct answer because the real 
meaning of the apostrophe in the underlined word is not given. 



A. 


The cuckoo 


clock's been br 


oken for a long time. 








possessive 


' clock is 


clock has none 


of 


these 


B. 


There wasn' 


t any question 


that Andy was the best runner. 








possessive 


' was none 


was no none 


of 


these 


C. 


The wind's 


energy is used 


by windmills to raise water. 








possessive 


' wind has 


wind is hone 


of 


these 







Figure 4. Directions and practice items from the first page of the 
Contractions & Possessives Subtest. 
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Table 8 



Target Affixes Assessed in the Affixes Subtest 



Pre fixes 




Suf fixes 


re- 




-ful 


non- 




-or/-er 


dis- 




-less 


xm- 




-able 


±n-/iin- 




-ment 


SUiD- 






inter- 




-(e) ous 


pre- 










-ness 






-(t)ion 
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printed nevt to the item number. The four response choices consist of 
the correct response and three foils of other affixed forms of the same 
root word which do not fit the description or definition given in the 
item stem. 

As with the other two structure subtests, children Work independently, 
reading the prose description for each item and then circling a responsie 
choice. As soon as a child con^letes ah item, he or she continues on to 
the next item. Figure 5 is a copy of the directions and practice items 
from the Affixes Sxibtest. 

Procedure 

Each class participating in the study was given the designated sub- 
tests from the Word Identification Test battery and the Reading Subtest 
of the Metropolitan Achievement Tests • Table 9 indicates which of the 
five Word Identification Subtests were administered at each grade level, 
and the corresponding levels of the Reading Subtest of the Metropolitan 
Achievement Tests. As shown in the table, fifth grade students received 
only one subtest from the Word Identification Test battery and first grade 
students were given two, while second, third, and fourth grade students each 
riscisived three of the give word Identification Subtests. (ftitSough it would 
have been appropriate for third grade subjects to receive all five subtests, 
an effort was made to limit the participation time per class.) 

The siibtests were administered in varying orders, depending on the 
sequence that best fit the time allocations specified by the schools. 
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Affixes 



Look at each row and read the word in the small box. This word is the 
root or base word for the sentence. Now read the sentence. Below the 
sentence are four answers, each containing the root word plus another 
word part or parts. Draw a circle around the word that is described or 
defined in the sentence. 



A. 

happy 



A word that describes a person who is not happy; 

unhappy happies^ happily unhappiness 



fi. 
sweet 



A word that means the quality of being sweet: 
unsweetened sweet ness sweetei sweetest 



C. 

drive 



A word that describes a car that can not be driven: 
driving driver drivable undriva^tte 



Figure 5. Directions and practice items from the first page of the 
Affixes Subtest. 
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Table 9 

Subtests by Grade tevei from the Word Identification Test 
Battery and the Metropolitan Achievement Tests 



Phonics Subtests 



Structure Subtests 



Consonants Vowels inflected Endings affixes 



Contractions & 
Possessives 



Metropoiitai 
Achievement 
Test 
(Form JI) 

Reading 
Subtest 



X 
X 



X 
X 



Primary I 
Primary II 



A 
B 



X 
X 



X 
X 
X 



Elementary 
Elementary 
Elementary 
Intermediats 
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When the Phonics Subtests were administered, however, the Consonants 
Stibtest was always given before the more difficult Vowels Subtest • 

An administrator's Manual was prepared for each subtest, and the 
directions for administering each subtest were read from the appropriate 
manual. The procedxire for administering each of the subtests is des- 
cribed below. 

CONSONANTS SUBTEST 

After the booklets for the Cbnsbncints Subtest were distributed and 
the appropriate student identification information entered on the cover, 
the examiner explained to the students that they would be listening for 
consonants sbiirids at the beginning of words. The examiner then worked 
with the students on three practice items. 

For each practice item, students were directed to look at the synthe- 
tic word in the box and to prdhoiince the word silently to themselves. 
They were told to especially note the iinderlined letter (s) in the word, 
and to determine the sound of that iinderlined part. Next, the examiner 
read the names of the four pictures in the row. The students' task was 
to circle the picture whose name began with the same soiind as the sound 
of the underlined part of the synthetic word. 

Following discussion of the practice items (between 5 and 10 minutes) -, 
the examiner paced the children throu^ ail the test items. The examiner 
instructed students to, "put your finger on Row # and say the made-up 
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word to yourself." The examiner then riarned the four pictures in the 
row. This procedure was repeated for each item on the test. 

When the Initial Consonants section of the Consonants Subtest was 
completed, the examiner explained to the students that they would next 
be listening for consonants sounds at the and of wordis. The examiner 
worked with students on two practice items for final consonants sounds. 
At the end of the practice period, the examiner led the children through 
the actual test items saying each picture name as before. Total test 
time for both the Initial and Final Consonant sections (including practice 
items) ranged from 30 to 55 minutes. 

VOWELS SUBTEST 

The Vowels Siabtest was administered in the same way as the Conso- 
nants Subtest, except that students were instructed to listen to vowels 
soxinds in the middle of words (51 items) and at the end of words (5 items) . 
Practice items were provided for items in both positions. Total test 
time (including practice items) ranged from 27 to 45 minutes for the 
entire Vowels Subtest (both Medial and Final positions) . 

INFLECTED ENDINGS SUBTEST 

After the booklets for the Inflected Endiiigs Subtest were distributed 
and the appropriate student identification information entered on the 
cover ^ students worked with the examiner on three practice items; the 
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first practice item was written on the chalkboard and also appeared in 
the test booklet. 

Students were then directed to work on the items independently 
until they completed all six pages of the test booklet, after which 
they could either draw a picture on the back cover of the booklet or 
do quiet seatwork. Total time for the administration of the Inflected 
Endings Subtest (including practice items) ranged from 17 to 36 minutes. 

CONTRACTIONS & POSSESSIVES SUBTEST 

The Contractions & Possessives Subtest was administered in the 
same way as the Inflected Endings Subtest except that pupils were told 
that the test booklet was about the two meanings of the apostrophe mark. 
As in the Inflected Endings Subtest, the children did three practice 
items with the examiner and then were directed to w5fk independently on 
the rest of the test booklet. The only notable difference in procedure 
was that all three practice itemis from the Contractions & Possessives 
Subtest were wr-itten both on the chalkboard and in the test booklets. 
Total time for the aSninistration of the Contractions & Possessives 
Subtest (including practice items) ranged from 17 to 35 minutes. 

AFFIJffiS SUBTEST 

The procedure for administering the Affixes Subtest was the same 
as the procedure followed for the two other structure subtests, except 
pupils were told that the booklet was about prefixes and suffixes—word 
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parts that are added to the beginning or end of a word to change its 
meaning or use in a sentence. 

Students worked with the examiner on three practice items (none 
of which was written on the chalkboard) and then were directed to work 
indpendentiy on the remaining items in the booklet. Total time for the 
administration of the Affixes Subtest (including practice items) ranged 
from 23 to 35 minutes. 

READING SUBTEST OF THE I^TROPOLITAN ACHIEVEMENT TESTS 

After the booklets for the Reading SuiDtest of the Metropolitan 
Achievement Tests were distrilDuted and the required student information 
entered on the cover, the examiner directed the children to open their 
booklets to a specific page. Following a discussion of the sample items, 
the exfflniner directed the children to work independently oh the next 
several pages until they reached the word "STOP." When the children 
had completed all the items on the subtest they were permitted to do 
quiet seatwork of their choice. Total time for the administration of the 
Reading Subtest (including the sample items) took approximately 40 minutes 
for the first grade ^ and approximately 45 minutes for grades two through 
five. 

Each of the five Word Identification Subtests, as well as the Metro- 
politan Reading Subtest, was administered separately, with a break 
between subtestis. Testing was conducted in one or two sessions, depending 
on the number of stjibtests to be administered and the amount of class time 
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a.lic^oated. Testii.g sessions were usually scheduled over tv/o separate 
ilar/s although^ in a few cases, the sessions were broken up by the lunch 
period. For one sc-ibol^ the two sessions were one week apart. All 
testing took place between March and May 1980. 

With the exception of the school in Antioch, Illinois, all testing 
was conducted by either specially trained personnel hired by the Wisconsin 
Research and Development Center or by Project staff. In Antioch, the 
tests were adininiistisred by classroom teachers under the sujpervision of 
the Reading Consultant, a former member of the Project staff. 



The primary purpose of the final study was to obtain word identifi- 
catibh arid reading comprehension data to establish performance guidelines 
for mastery on the Word Identification Test battery. In addition, 
summary statistics on each of the word identification subtests were 
calculated and are reported below. 

PHONICS TEST 

Summary statistics for students' performance on the two Phonics 
Subtests (Consonants and Vowels) are presented in Table 10. As antici- 
p>ated, children performed better on the Consonants Subtest than on the 
Vowels Subtest. 

T-tests for significant differences in performance due to sex and 
grade level of subjects were performed on the data. Sxammary information 
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Table 10 



Summary Statistics for the Consonants and Vowels Subtests 

(n^ = 339) 



Number X Hoyt estimati 

Name of of * Steuidard of 

subtest items correct SD error Range % correct reliability 



Consonants 90 89.19 10.87 2.36 23.33 - 100.00 

(Grades 1, 2, 3) 

Vowels 56 72.35 25.40 2.81 7.14 - 98.2l' 

(Grades 1, 2, 3) 
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from these t-tests is presented in Tables 11 and 12. No difference in 
performance due to sex of subjects was observed on either of the two sub- 
tests. Grade level differences, on the other hand, were significant for 
both the consonants and Vowels subtests with one exception; the performance 
of second and third grade subjects was not significantly different on the 
consonants Subtest. This may be because by the end of second grade most 
students have mastered the maj5r consonants correspondences. This sugges- 
i.ion is supported by the high mean score (92.11%) obtained by second 
graders, fit higher grade levels, therefore, relatively small changes in 
consonant scores are observed. 

In contrast, significant grade level increases in scores assessing 
the vowel correspondences are apparent on the Vowels Subtest. Besides 
the expected difference in performance of second grade over first grade, 
third grade performance shows a significant increase over second grade 
performance. Evidence from previous studies suggests that fourth grade 
students would achieve still higher scores on vowels correspondences, 
although the differences might not be statistically significant. 

subjects' performance on the Phonics Subtests was also examined 
by item categories within each subtest. This information, as well as 
reliability estimates for the Subtests, is presented in Tables 13 and 14 
for Consonants and Vowels. 

The rank-ordered listing 8f mean percent correct scores by consonant 
category differs somewhat fiom results of the previous study: Children 
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Table 11 



Siammary of t-tests for Differences 
Due to Sex 



Name of 
subtest 


sex 




X 
% 

correct 


DF 


t -value 


Probability 


Consoncints 


Boys 
Girls 


171 
168 


69.73 
69.85 


337 


.15 


.878 


Vowels 


Boys 
Girls 


160 
156 


73.02 
72.87 


314 


.07 


.945 
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Table 12 



Summary of t-tests for Differences 







Due 


to Grade 


Level 












X 








Name of 


Grade 




% 








subtest 


Level 




correct 


DF 


t^-value 


Probability 




i 


119 


81,16 
















223 


6.58 


.000 


Consonants 


2 


106 


92,11 
















218 


.69 


.494 




3 


114 


94,31 










1 


99 


48,85 
















203 


11.51 


.000 


Vowels 


i 


106 


73.47 
















215 


3.06 


.003 




3 


111 


79.25 
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Table 13 

Siiinmary Statistics for Consonants Subtest 
by Item Category and Grade Level 
(n = 339) 



Item category Nvunber of items Mean percent correct 







Grade 


1 


70.10 


Single -letter 
con son cm ts 


36 


Grade 


2 


71.15 






Grade 


3 


71.22 






Grade 


1 


66.39 


Consonant 
clusters 


38 


Grade 


2 


75.15 






Grade 


3 


74.56 






Grade 


1 


70.67 


Con son ant 
digraphs 


10 


Grade 


2 


81.32 






Grade 


3 


83.33 






Grade 


1 


21.43 


Variant single-letter - 
consonants (cj g, and s) 


6 


Grade 


2 


41.98 






Grade 


3 


46.93 


Total test 


90 
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Table 14 

Summary Statistics for Vowels Subtest 
by item Category and Grade Level 



Item category 



Number of items Mean percent correct 



Long vowels 



10 



Grade 1 
Grade 2 
Grade 3 



44.34 
71.98 
73.87 



Short vowels 



10 



Grade 1 
Grade 2 
Grade 3 



49.49 
73.96 
77.48 



Vowel 
clusters 



28 



Grade 1 
Grade 2 
Grade 3 



64.47 
89.86 
98.13 



Other single-letter 
vowels 



Grade 1 
Grade 2 
Grade. 3 



33.71 
53.77 
59.23 



Total test 



56 
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in botli studies perfdrmed pooreist on variant single-letter consoncints 
(i.e-^ those cbnsbriarits letters — c, and s-— having more than one 
conunbn sound correspondence) . In the present study, performance was 
highest on the consonant digraph category rather than on the single- 
letter corisdriants category which had shown highest performance in the 
previous study. The second and third grade subjects in the present 
study perfdrmed considerably better on the items assessing digraphs 
than they did on items assessing single-letter consonants. Whether thia 
was due to more recent instruction on digraphs or to chance was not 
determined. Overall, for the present subtest, mean percent correct on 
consonant categories ranged from 78.44 for digraphs to 36.78 for variant 
singie-^letter consonants. 

On the Vowels subtest^ perfonnahce on categories followed an iden- 
tical pattern to that observed in the previous study: Children did best 
on vowel clusters and least well on variant vowels correspondences 
(vowel letters which correspond to a niamber of frequently occurring 
sounds). Mean percent correct scdres ranged from 84.15 for items 
assessing vowel clusters, td 48.90 for the variant other singie-iettef 
vowel category. Although performance increases with grade level across 
all categories, the most notable feature of the data is the very large 
jump in mean performance on all item categories between first and second 
grades (see Table 14) . ftltiiou^ perfo rman ce for all categories is yet 
higher for third grade, tiie mean differences are not as great. Mastery 
of vowels correspondences appears to occur somewhat later fpr niost 
students than does the mastery of consonants correspondences. 
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As discussed eaxiier, the Phonics Subtests were designed to facili- 
tate the analysis of error patterns— -response foils were carefully 
created using speech prodiictidn and confusion matrix information for 
each subtest item. The item analysis data were tallied item-by-item in 
order to obtain selection rates for subjects' errors. These error selec- 
tion rates are presented by grade in Table 15 for the Consonants Subtest 
and in Table 16 for the Vowels Subtest. 

For the Consonants Subtest, the foil categories were Acoustically 
Close (Acoustic) , Visually close (Visual) , the other frequently occurring 
sound of a variant consonant (other) , or Neither Acoustically Close nor 
Viisually Close to the target letter(s) or sound (Neither). Rates of 
selection for tliese foil categories were consistent across all three 
grade levels: "Other" foils had the hi^est rate of selection, and 
"Neither" foils had the lowest rate. The very low rate of selection 
for "Neither" foils across all three grade levels is interesting. A 
selection rate of approximately 25% for this foil type would indicate a 
pattern of random guessing. As in the previous study, siabjects at all 
grade levels tended to be strategic, rather than random, in their selec- 
tion of responses. 

For the Vowels Subtest, the foil categories were Acoustically Close 
(Acoustic) , Visually Close— either one of the two letters for a target 
cluster or digraph (1 of 2) , anotiisr coinmon sbiand correspondence for 
single-letter vowels (Other) , and Neitiier Acoustically close nor Visually 
Close to the target letter (s) or sound (Neither). As with the Consonants 
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Table 15 

Percent of Times Each Consonant Foil Was Selected 



Visual Acoustic OR Other Neither 

Grade (present in 90 iteias) (present in 79 items) (present in 11 items) (present in 90 items) 

1 7.261 4.26% 46.9% 1.3% 

2 2.17% 1.28% 36.9% 0.33% 

3 i.54% 0.91% 31.2% 0.22% 

Note. The percentage figure indicates the number of times the foil was incorrectly selected, 
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Table 16 



Percent of Times Each Vowel Foil Was Selected 



Acoustic Visual Neither 

' ' '" ■^■""i""" ■■■ I mill iiiB I ■ ■ pn.„ 

Other OR 1 of 2 

Grade (present in 56 items) (present in 25 items) (present in 31 items) (present in 56 items) 

1 11.231 31.841 17.351 4.271 

2 8.20% 24.2* 12.3% 1.84% 

3 6.69% 22.16% 8.61% 1.25% 

mjt. The percentage figure indicates the number of times the foil was incorrectly selected. 
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Subtest, the rates of selection for the various error categories in the 
Vbwels Subtest were coit4>letely consistent across all three grade levels 
tested. The highest rate of selection was observed for the "Other" cate- 
gory, and the lowest rate was for the "Neither" category* 

Because error analyses are generally done on the responses of in- 
dividual students, rather than tallied for groups, five second grade 
subjects were randomly selected from each of three score ranges for the 
Vowels and Consonants Subtests and their errors were tallied. Students 
scoring one standard deviation above and below the mean comprised the 
High and the Eow groups, respectively. Students scoring at or very near 
the mean coit^rised the average group. The students' errors were tallied 
individually and then summed by error type. 

A breakdown of children's error patterns for the Consonants Subteist 
are presented in Table 17. The patterns do not differ markedly from 
those observed from the entire sample: the "Other" response foil had 
the highest percentage of selection, whereas the "Neither" response foil 
had the lowest. A notable observation is that hi^ scorijig subjects 
making errors did not select the visual response foils. The fact thai 
average and low scoring subjects frequently selected the visual foil 
may indicate a tendency for visual matching of letters—that is, average 
and low scoring subjects may have mentally spelled out the picture names 
and then proceded to "match" the letter (s) of the picture name to the 
letter (s) of the target synthetic word. 

The findings weare similar for the Vowels Subtests the "Other" 
response foil was most frequently chosen by subjects in cd.1 three score 
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Table 17 

Numtier of Times eonsonant Foils Were Selected 
by Each Score Group 
(n = 5 per group) 



Visual Acoustic OR other Neither ^°^*L"^' 

_ _ _ : of errors 

present in 90 items present in 79 items present in 11 items present in 90 items 

(number possible (number possible (number possible (number possible 

...J^^.^'^^.l^^^^^^^^^ responses = 395) responses = 55) responses = 450) 

High 0 (0%) i (0.2%) 16 (29.1%) q (o%) i? 

Average ^3 (7.3%) 18 (4.5%) 24 (43.6%) 4 (0.9%) 79 

^» 71 (15.8%) 37 (9.4%) 18 (32.7%) 26 (5.8%) 152 



Me. The percentage figure in parentheses indicates the percentage of times the foil was incorrectly 
selected. 
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ranges^ whereas the "Neither" response foil was selected the least (see 
Ts±>le 18) . Most of the errors made were on items assessing the long or 
short sound of a single-letter vowel. In such items, the "Other" response 
foil represented the long or short pronunciation counterpart of the target 
vowel sound (e.g., long Oj if short o were the correct answer). One in- 
terpretation as to why children made so many errors oh this type of 
item may be that many of these subjects had not yet learned that the 
silent e^ is a marker for long vowels. As expected few stibjects made ran- 
dom guesses (as indicated by the low selection of the "Neither" response 
foil) ; the percentage of selections of the "Neither" response foil was 
greatest for children who performed the poorest on the Vowels Subtest. 

Summary statistics for the three Structure Subtests (Inflected Endings, 
Contractions & Possessives> cind Affixes) are presented in Table 19. As 
indicated in the taJ3le> children performed best oh the Inflected Endings 
Subtest. The high performance oh this sxibtest was anticipated > because 
Inflected Endings is one of the earliest components of structural analy- 
sis to be taught in most reading prbigrams* 

T^-tests for signif icajcit differencess in performance due to sex and 
grade level of subjects were also performed for each subtest. Summary 
information from these ^-tests is presented in Tables 20 and 21, respec- 
tive3:y. The only case in ^ich a significant difference due to isex was 
observed was in the Inflected Endings Subtest (which was administered 
to second, tiiird, and fourth grade siabjects) t Girls outperformed boys 



130 



Table 18 

:iines Vowel Foils We: 
by Each Score Group 



(n = 5 per group) 



Acoustic visual Neither Total number 

other OR 1 of 2 

present in 56 items present in 25 items present in 31 items present in 56 items 

(number possible (number possible (number possible (niiber possible 

Group responses - 280) responses = 125) responses = 155) responses = 280) 



High 5 (1.8%) 7 (5.6%) 3 (1,9%) 0 (0%) 15 

Average 21 (7.5%) 48 (38.4%) 25 (16.1%) 7 (2.5%) iOi 

Low 41 (14.6%) 47 (37.6%) 39 (25.2%) 22 (7.8%) 149 



Note . The percentage figure in parentheses indicates the percentage. of times! the foil was incorrectly 
selected. 
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Summary Statistics for Structure Subtests 



m of 
obtest 



Number X 
of % 
N items correct 



SD 



Standard 
error 



correct 



fioyt esti 
mate of 
reliabilit 



iflected 
idings 

Srades 2, 3, 4) 



306 39 88.90 14.34 1.65 20.51 - 100 



.91 



fractions & 
)ssessives 
}rades 3, 4} 



201 31 83.89 15.55 1.77 35.48 - 100 



.86 



ffixes 

trades 2, 3, 4) 



319 54 



83.40 12.88 2.24 22.22 - 100 



.84 
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Table 20 



Suiranary of t-tests for Differences Due to 







Sex on 




Structure 


Subtests 






Name of 
subtlest: 


Sex 


N 


% 








correct 


df 


t;-value 


Probability 


Inflected 
Endings 


Boys 
Girls 


137 
169 


86.36 
90.96 


304 


2.82 


.005 


Contractions 
& Possessives 


Boys 
Girls 


87 

114 


83.. 27 
82.92 


199 


.74 


.500 


Affixes 


Boys 
Girls 


142 
177 


83-09 
83.66 


317 


.39 


.694 
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Table 21 

Summary of t^-tests for Differences Due to 
Grade Level on Structure Subteists 



X 

Name of Grade % 

subtest level N correct df t-value Probability 



2 106 79.41 

194 5.86 .000 

Inflected 3 92.28 

Endings 198 2.74 .007 

4 110 95.27 

3 92 80.79 

Contractions 2.99 2.64 .009 
& Possessives ^ ^^ ^^ 

3 93 77.26 

198 3.38 .001 

Affixes 4 107 83.99 

224 2.82 .005 

5 119 87.68 
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with a resultant ^-value of 2.82 with probability = .005. This finding 
was not svirpriising since girls generally outperform boys in reading tasks 
in the primary grades. All grade level differences were significant. It 
is interesting to note the large jump in performance between second and 
third grade subjects on the Inflected Endings Subtest (Table 21) in con- 
trast to the minimal rise in mean scores between third and fourth grade 
subjects on this subtest. Thiis may be attributed to the fact that many 
inflected endings are taught in third grade. 

Another purpose of the study was to confirm the relicibility of test 
items. An item analysis indicated that, for all three Structure Subtests, 
each individual item showed reliability with the total siabtest; hence, 
revisions on the subtests are not needed. (Reliability estimates for 
each subtest are presented in Table 19.) Students' performance on each 
item category within each of the three Structure Subtests was also examined 
This performance information is presented in Tables 22, 23, and 24 for 
Inflected Endings^ Contractions & Possessives, and Affixes, respectively. 

In general, the rank -ordered listing of mean percent correct by 
category within the Inflected Endings Subtest confirms results of the 
previous study: Children performed best on items assessing the ing end- 
ings (mean percent correct = 97.9) and, excluding items in \^ich the 
correct response choice wais "none of these," least well on items assessing 
the tense marker (mean percent correct = 80.4). In the previous adminis-- 
tration of the Inflected Endings Subtest^ performance was generally poor- 
est on items with a correct response of "none of these." As indicated in 
Table 22, this was also true in the current study. 
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Table 22 

Rank-ordered Listing of Mean Percent Correct 
on Item Categories in inflected Endings Subtest 

(n = 306) 



Item category 


Nuir^r of items 


Mean percent correct 


ing 


3 


97.9 


root 

(correct response) 


8 


92.8 


(e)s (plural) 


4 


92.5 


y 


2 


92.3 


(i)ed 


5 


90.8 


er 


3 


90.8 


est 


2 


88.7 


.e)s (verb) 


5 


85.5 


tense marker 

(with vowel change) 


1 


80.4 


"hone of these" 
(correct response) 


6 


77.4 


Total test 


39 


77.4 
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Table 23 

Rank -ordered Listing of Mean Percent Correct 
on Item Categories in Contractions & Possessives Subtest 

(n- = 201) 



Item categories Number of items Mean percent correct 

^ 1 98,5 

is 3 96.3 

will 2 95.5 

sre i 95.0 

Save i 95.0 

would i 2 83.1 

i^ot 3 82.6 

"none of these" -7 81.2 
(coriect response) 

possessives 8 7*9.7 

1 75.6 

had 2 68.2 

Total test 31 83,9 
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Table 24 



Rank-ordered Listing of Mean Percent Correct 
on Item Categories in Affixes Subtest 
(n = 319) 



Item categories Number of items Mean percent correct 



Prefixes 

hbh- 
re- 



3 95.7 
3 93.7 
3 93.7 



dis- 3 87.6 

in~/im- 3 87.0 



sub- 3 78.2 

pre- 
inter- 



3 70.9 
3 52.2 



-ful 3 93.7 

-able/-ible 3 93.5 

-less 3 92.5 

-or/-er 3 91.8 

-ment 3 90.2 

-en 3 88.0 

-ly 3 83.2 

-ness 3 79.2 

-(ij (i)ous 3 70.0 

-(t)ion 3 59.9 



Total test 



54 



83.4 
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A similar pattern of poor perfbmahce on items with the correct 
response "none of these" also had been noted in the previous administra- 
tion of the Contractions & Pbssessives Subtest. To evaluate the impact 
of the "none of these" foil on the Contractions s Possessives Subtest, 
revisions were made in the response choices of four items prior to the 
final administration of the test in the current study: A response choice 
in one item having the correct answer as "hone of these" was changed so 
that the correct contracted form was present; the response choices for 
two items in which the correct contracted foirm was present were changed 
so that the foils "none of these" became the correct answers; and the 
correct response choice for onie possessive item was changed to "none of 
tiiese" (formerly, there were no possessive items with a correct response 
of "none of these"). 

A comparison of performance on these items on the two test versions 
indicates that children do perform differently when the correct answer is 
not present (i.e., when "none of these" is the correct response choice), 
as opposed to when the correct answer is among the response choices. 
Because of this effect, items with "none of these" as the correct response 
are grouped as a separate category in Tables 22 and 23 rather than being 
incorporated into the appropriate item category. (The response choice 
"none of these" was not present in the Affixes Sxibtest.) 

As - in the-previous-s tudy-'on--the ' Contractions^ &"~Posse"s sive~^^^^ 

children performed best on the contracted forms of am and is. Mean per- 
cent correct scores on the contraction categories ranged from 98.5 (for 
am) and 96.3 (for ±b) to 68.2 (for had) . 



124 



'^Overall performance on the prefixes and suffixes categories in the 
Affixeis Subtest varied according to grade level. As indicated in Table 
25, third grade students performed slightly better on suffices thcui on 
prefixes. This finding is in line with the result of the previoiis study 
in which second and third grade students performed better on items 
assessing suffixes than on items assessing prefixes. Notably, however, 
fourth and fifth grade siijects performed slightly better on prefixes 
than on suffixes. 

In the rank-ordered breakdown of items within the prefix category, 
students across ail three grade levels performed best on the prefix non- 
(mean percent score = 95.76), and poorest on the prefix inter- (mean 
percent score = 52.2). Relative performance on items within the prefix 
category differed from the previous study as a result of extensive revi- 
sions made for the present version of the subtest. Within the suffix 
category^ performance in the current study was best on the suffix - ful 
(mean percent score = 93.7), and poorest on -i[t) ion (mean percent score = 
59.9). Although relative performance on items for these two suffixes 
was the same as in the previous study, the rank orderings of the other 
suffix items varied considerably because of the revisions made for the 
present test. 

Overall performance on the Affixes Stibtest Was considerably higher 
"for ail categories than was observed in the previous study. This was 
attributed to the fact that in the current study, the test was adminis- 
tered to older students. 



Table 25 



Performance on Prefixes and Suffixes by Grade Level 



Grade Number of items Mean percent correct 

Prefixes Suffixes 

3 93 75.81 76.06 

4 107 82. 9G 82.34 

5 119 87.08 85.77 



In summary, total group perfbrihahce on all three Structure Subtests 
was generally similar to total group performance on the previous version. 
Exact rankings of specific categories varied^ however, due to differences 
in many of the test items and the ages of the subjects in the two studies. 
In the previous study > all subtests were given to second and third grade 
students. In the current study > it appeared most appropriate to give the 
Inflected Endings Subtest to second^ third, and fourth grade subjects; 
the Contractions & Possessives Subtest to third and fourth grade subjects; 
and the Affixes Subtest to third, fourth, and fifth grade subjectis. 




THE ESTABLISHMENT OF PERFORMANCE GUiDELlNES 
FOR THE WORD IDENTIFICATION TEST BATTERY 

AS the review of mastery learning theory indicated, most criterion- 
referenced assessment devices discriminate between children who have 
mastered specific siabskills arid those who require additional instrtictibn 
and practice. Typically, cutoff scores or mastery levels have been set 
by publishers of tests and tend to be absolute. For example, in many 
skills management programs, a score of 80% or better indicates mastery 
of a skill. Recently, however, Terwilliger (1979) pointed to a need for 
flexibility in setting performance standards, because the arbitrary 
designation of one uniform standard for ail tests has riot been empirically 
justified. This section discusses the development of performance guide- 
lines for the Word Identification Test battery in relation to issues of 
skills mastery. These guidelines reflect the concern expressed by 
Terwilliger for flexibility in setting mastery standards. Iri addition 
to the data obtained on specific subtests, the criteria for mastery take 
into account childrehs' reading comprehension performance and grade level. 

The establishment of the performance guidelines for the Word Identi- 
fication Test battery was based on an innovative, two-stage process. 
First, students participating in the final administration of the Word 
Identification Test were stratified into three comprehension ability 
groups s low, average, and high. These ability groups were formed on the 
basis of childrens' scores on the Reading Subtest of the Metropolitan 
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ftchievement Tests, a standardized measvire of reading comprehension. 
Children in the low comprehension group scored below their respective 
grade level, children in the average group scored at grade levels and 
children in the high group scored above grade level. The concept under- 
lying this technique is Glaser's (1963) assertion that a criterion- 
referenced test is based on the notion of a continuum of knowledge ac- 
quisition ranging from no skill to perfect performance. 

Second, mean scores were calculated for the three comprehension 
groups at eiach grade level for every subtest in the Word Identification 
Test battery. These subtest stemdards are presented in Table 26. Because 
the individual subtest scores are composites of the categories of items 
comprising them, additional performance standards were calculated 
separately for all categories within each subtest. These standards for 
subskill categories are listed in Tables 27 and 28 for the Phonics and 
Structure Components, respectively. 

The rationale for including performance guidelines for subskill 
categories on the subtests is to provide teachers with information 
about areas of both strength and wea3cness within each subtest. Typically, 
a student receives one score for ah entire subtest. But, of what use to 
a teacher is a score of 5G%? While such a score may indicate the need 
for more practice, it does not provide the teacliSr with information as 
~'tb "th"^"'lia^^ ^Standaras^for""suB^ 
on the other hand, enable a teacher tb use test information diagnbstically. 
For example, with specific scores on subskill categories within the 



Table 26 



Performance standards for Low, Average, and High Comprehenders on 
subtests in the Word Identification Test Battery 
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Subtest 



Grade 



Low 



Comprehension Group 



Average 



High 



Phonics 



Consonants 



Vowels' 



1 
2 
3 

i 
2 
3 



51,2 
62,7 
63.5 

40.1 
61.2 
54.2 



60,7 
65,6 
68,6 

47.4 
69.3 
71.2 



62.9 
69.1 
69.8 

58.9 
75,9 
80,8 



Structure 

Inflected 
Endings 



Contractions 
Si Possessives 

Affixes 



2 
3 
4 

3 
4 

3 
4 
5 



49.7 
60.0 
89.4 

49.1 
58,9 

46,9 
61,2 
71,5 



70.5 
87.2 
93.2 

61.8 
77.7 

54.1 
73.6 
81,6 



89,0 
95,1 
96,4 

84,1 
90,7 

81.1 
87.1 
88,3 
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Table 27 

Performance Standards for Low, Average^ and 
High Comprehenders on Phonics Subskills 



Phonics 
siibskills 



Grade 



Low 



Comprehension group 



Average 



High 



Consonants 

S ingle - 
letter 



Clusters 



Digraphs 



Other 
(variants) 



1 
2 
3 

i 

2 
3 

i 

2 
3 

1 
2 
3 



67.8 
72-4 
68.1 

59.4 
73.7 
70.0 



62. 
75. 



76.0 

15.1 
29.8 
40.0 



71.6 
70.4 
71.0 

70.5 
74.6 
75.1 

78.0 
80.0 
82.4 

22.8 
37.3 
46.0 



72.3 
71.1 
71.7 

73.3 
75.7 
75.0 

76.4 
83.1 
84.5 

29.6 
46.3 
48.0 



Vowels 
Long 



Short 



Clusters 



Other single - 
letter 



1 
2 
3 

i 

2 
3 

i 

2 
3 

1 

"2- 



35.1 
60.7 
53.3 

41.7 
66.4 
55.6 

54.4 

71-2 
71.8 

29.3 

-46T4~ 
36.1 



44.4 
70.8 
65.7 

44.4 
71.2 
73.8 

65.7 
84.6 
95.4 

35.2 

-5075- 
50.0 



56.4 
74.8 
78.3 

64.2 
76.6 
80.9 

76.6 
95.7 
99.9 

38.3 

56^5- 
64.2 
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Table 28 

Performance Standards for Low^ Average, and 
High Comprehenders on Structure Subskills 



Comprehension group 

Structure . ^ — ^ 



subskills iSrade Low Average High 



Inflected Endings 



2 
3 
4 



49.3 
60.0 
89.4 



70.5 
87.2 
93.2 



89.0 
95.1 
96.4 



Contractions St Possessives 



Contractions 



Possessives 



3 
4 

3 
4 



63.9 
70.2 

34.3 
47.5 



71.4 
81.6 

52,2 
73.8 



86.9 
91.0 

.81.2 
90.4 



Affixes 



Prefixes 



Suffixes 



3 
4 
5 

3 
4 
5 



47.6 
63.7 
72.9 

46.2 
58.6 
70.0 



53.7 
72.7 
79.9 

54,4 
74.4 
83.2 



80.9 
87.5 
89.4 

81.3 
86.7 
87.1 
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Consonants Subtest, a teacher would know whether a low score could be 
attributed to a students' weakxieiss in single-letter consonants r consonant 
clusters, consonant digraphs, or in the variant single-letter consonants 
and could plan instruction accordingly. 

Because the data reported here is based on a single study group, 
there are several instances where performance between comprehension groups 
does not change in the expected direction. For exan^ie, for the single- 
letters category of the Consonants Subtest, mean scores for the three 
ability groups ail cluster at about 7G%, with the low group performing 
slightly better than the average and high comprehension groups. Similarly, 
for Consonant Digraphs, the average group obtained a slightly higher score 
than the high con^rehens ion group. While a larger or different population 
of subjects might have produced scores in the expected direction^ these 
discrepancies point out an important facet of skills acquisitibn--that 
is, mastery of specific subskills :is not always consistently correlated 
with comprehension ability. This interpretation is supported by the rela- 
tively low correlations between subskill performance and cdmprehehisidn 
ability on the two consonants categories reported above (see Table 29 
for correlations lisjtingis) . These data isuggest the need for teachers to 
be flisxible iri judging score profiles of individual readers. 

The performance guidelines for the Structure 'subtests presented in 
Table 28 show relatively large variations between grade levels and 
between comprehension groups. Oliis variation is not surprising, because 
structural analysis skills are taught throughout the middle elementary 



Table 29 

Pearson Correlations of Subtest Category Score 
with Metropolitan Comprehension Scores 



Subtest categories 



Consonants (N = 339) 

Single-letter .2765 .000 

Clusters .4248 .000 

Digraphs .3674 ^qoO 

Other (variants) .4656 .000 

Vowels -(N = -316^) 

^^ong .4868 .000 

Short .3961 .000 

Clusters .5715 ^qOO 

Other single-letter .3630 .000 

Inflected findings- -(N-^ 3^6) .6850 .000 

Contractions & Possessives (N = 201) 

Contractions .5794 . 000 

Possessives . 5331 . OOO 

Affixe^-CN ~ 319) 

Prefixes .5502 .000 

--—Suffixes— - - - - .5997 " .000" 
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school years and, therefore, students would acquire mastery at various 
stages during this period • 

Most criterion -referenced tests and skills management systems argue 
for across-tl-ie -board mastery cutoff scores of 80 or even 90%. Of the 
42 performance standards reported in Table 26, 29 were below 80% and 
38 were below 90%. The range of perfonhahce was 40.1% correct to 96.4% 
correct. Clearly^ at times a score of 45% may be satisfactory for some 
students. At other times, nothing less than a score of 95% would be 
satisfactory. 

Prom the inspection of performance guidelines for mastery between 
the various structure siabskills, it is apparent that applying a uniform 
standard of 80, 10, or even 60% would not .be appropriate. In the Con- 
tractions S Possessives Subtest, for example, third graders in the 
avisrage compreherisidri group obtained a mean score of 71.4% on Contrac- 
tions, but drily 52.2% ori Possessives. Hence, despite the fact that these 
children comprehend at grade level, they would not be regarded as "masters" 
of possessive forms according to traditional skills management standards. 
The issue to address is not whether students who score 52% on Possessives 
are masters of that subskiii, but whether instruction aimed at raising 
these scores to some absolute standard will enable students to compre- 
hend better. For this reason, the standards presented in this report 
are labeled "performance guidelines," rather than "cutoff scores." 

According to Popham (1978) , there is no true and definitive cutoff 
score and, therefore, the need for further practice must be based on 
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subjective decisions. Kriewall (1972) argues foi the use of proficiency 
distributions. This idea is taken further by Block (1972, 1973), Millman 
(1973), and Terwilliger (1979), who advocate flexible mastery levels. 
Millman suggests that data from "masters" be used to establish standards 
and proposes that the cutoff scores vary so fundamental skills have 
higher cutoff points than nonessential skills. 

The suggestions of these researchers influenced the development of 
the performance guidelines presented in Tables 26, 27, and 28. Because 
the ultimate goal of reading instruction is reading comprehension, three 
groups representative of the continuum of proficiency in global compre- 
hension were used to establish flexible cutoff scores. By including the 
correlation listings (Table 29) in the mastery decision process, the 
teacher can be reasonably sure that when a low-scoring student is assigned 
to further practice on a skill, the skill is considered essential to com- 
prehension. 

Performance guidelines for the subskills assessed in the Word Identi- 
fication Test battery are intended for use with Table 29. These correla- 
tions were computed for each subtest category with scores from the 
Reading Subtest of the Metropolitan Achievemer.t Tests. Correlation in- 
formation provides an additional source to help teachers make sound 
judgments about particular subskill scores. For example, if a second- 
grade student of high comprehension ability scores 48% (which is below 
the performance standard) on vowel items assessing "other single letter" 
vowel correspondences, the teacher can consult the list of correlations 
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(Table 29) . The Pearson correlation of "other single letter" vowels with 
comprehension is only .3630- Therefore^ rather than autdmatically 
assigning the child to further practice on these correspondences (i.e., 
the o in love) , the teacher might decide that other uses of the student's 
time at this stage are more beneficial to the child's overall growth in 
reading skill. 



CONCLUSIONS 



The word identification Test battery was designed with attention 
to the major issues pertaining to skills mastery and assessment that 
are raised in the review of mastery learning. There were five important 
areas of concern in the development of the battery: (a) the basis on 
which target skills would be selected for inclusion; (b) the facilita- 
tion of error analysis by creating categorical distractors; (c) the ease 
and efficiency of test administration; (d) the independence of the test 
battery from any published set of materials to lessen the likelihood of 
teachers teaching to the tests; and (e) the establishment of flexible 
standards for skills mastery (performance guidelines) based on a global 
measure of comprehension, rather than on arbitrary cutoff scores. The 
manner in which each of these issues was resolved in developing the Word 
Identification Test battery is summarized below. 

First, all five subtests comprising the battery were developed in 
accordance with the particular subskiiis that the widely used reading 
programs teach. Only the most frequently occurring elements of language 
(based on frequency data) were selected as targets for assessment. De- 
tails of the subtest specifications are documented in the Interim Report; 
The Refinement o£ t he Test Battery to Assess Word, JdentJ.fication skills . 
(Johnson, Pittelman, Schwehker, s shriberg, 1980). Hence, the target 
items selected for assessment were ecologically based. 

Second, the battery facilitates teachers' use of error analysis in- 
formation by its use of distractor categories. Using the key to distractor 



1§4 



13d 



categories on the Phonics Test, for example, a teacher can analyze the 
error patterns of individual students with regard to visual confusion, 
auditory confusion, and random guessing. 

Third, the battery was developed to ease the burden of classroom 
evaluation. Because more testing time generally means less teaching 
time, the subtests were designed for efficiency and for group administra- 
tion. The tests are accompanied by administration manuals which include 
clear directions, illustrative examples, and practice items to prepare 
students for each test. Each of the subtests is administered separately. 

Fourth, the subtests were developed independent of any particular 
set of materials used for teaching reading. Therefore, the likelihood 
of teachers teaching to the test was minimized. This is a common problem 
in schools where a single skills management series is used for both assess 
ment and instruction. 

Finally, the battery provides flexible standards for skill mastery. 
Based on a student's level of comprehension^ a teacher can select the 
appropriate criterion score at the student's grade level for each sub- 
test arid for each skills category in the battery. 

The Wdrd Identification Test battery is a valid and relicU^le instru- 
ment. Although it is easy to administer, all subtests in the battery 
have considerable scope. The subtests enable educators to make accur- 
ate diagnostic decisions about the apportionment of instructional time 
on the most frequently occurring phonics and structural elements. Per- 
formance standards are provided for each subtest in the battery. The 
performance guidelines range from 34.3 to 96^4%, depending on the sub- 
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skill being measured, and the grade level ^ con^rehension ability of 
the student. More eloquently than any argu. appearing in the litera- 
ture, this range of expected performance demonstrates the inappropriate- 
ness of arbitrarily established, rigid mastery scores. The establishment 
of empirically based performance guidelines for the Word Identification 
Test battery, on the other hand, represents a flexible and innovative 
solution to the issue of skills mastery. 
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